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ESTIMATION BY DOUBLE SAMPLING 


By D. R. COX 


Statistical Laboratory, University of Cambridge 


1. INTRODUCTION 


This paper is about the following problem: to estimate an unknown parameter @ with 
assigned accuracy using as few observations as possible. The simplest case is to estimate 
@ with given standard error. More generally we may require the estimate to have variance 
a(9), some given function of #; an important special case is a(#) = a(* corresponding to 
a given fractional standard error a. Another possibility is to estimate by a £ % confidence 
interval of predetermined form, for example, of given width. Some practical situations in 
which these problems are relevant are discussed briefly in §6. 

In general it is impossible to get an estimate with the required properties by taking 
a sample of some fixed size. The number of observations must depend in some way on the 
observations themselves, i.e. some form of sequential sampling must be used. Usually, 
however, it is extremely difficult to construct a sequential sampling scheme leading to an 
estimate with the required properties, although Anscombe (i949, 1952) has given a general 
large sample theory. Another disadvantage of ordinary sequential procedures is the step- 
by-step calculation involved; in many applications this precludes the use of any but the 
very simplest sequential methods. 

Here the above problems are solved by double sampling. The basic idea is familiar (see, 
for example, Quenouille, 1950); a preliminary sample is used to determine how large the 
total sample should be. Stein’s (1945) elegant method for constructing a confidence interval 
for a normal mean of assigned width and confidence coefficient is a special case where the 
exact distribution theory is known. The present double-sampling methods are different 
from those used in industrial inspection, in that in the latter case the second sample, if 
taken, is of fixed size. 

The theory developed below is a large sample one, but in practice is likely to give 
reasonable approximations even with quite small samples. For example, we shall construct 
an estimate of # with bias O( N~-*) and variance a(@) [1 + O(N -*)], where NV is the preliminary 
sample size and a(@) the assigned variance. 

For any @ the mean sample size for the double-sampling solution will be greater than the 
corresponding mean sample size for the ‘best’ sequential procedure. However, we shall 
show that the difference is likely to be small except when the sample sizes are very small. 


2. ESTIMATION WITH GIVEN VARIANCE: ONE UNKNOWN PARAMETER 
2-1. Theory 
Suppose that there is one unknown parameter @ and that we require to estimate it with 
variance equal to a given function of 7. The main applications are to the estimation of Poisson 
and binomial means and of a normal mean when the population variance is known. We give 
the large sample theory in which the assigned variance is small. To establish clearly the 
relative magnitude of the various terms we shall consider a sequence S, of problems letting 


the parameter A tend to infinity. 8, is the problem; estimate ? with variance a()/A. 
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Assume that in random samples of any fixed size m we can construct an estimate ¢™ of 
# such that 

(i) ¢” is an unbiased estimate of 0 with variance v(@)/m; 

(ii) the skewness coefficient y, of t™ is asymptotically y,(0)m-* and y, of t™ is O(m-) as 
m tends to infinity; 

(iii) asymptotic means and standard errors can be derived for a(t™), v(t) and combinations 
of these functions by expansion in series. 

These assumptions are sufficiently general for the present applications. There is no 
difficulty in considering estimates with non-zero asymptotic skewness or kurtosis. 

Now if 6 were known, a sample of size Ang(9) = Av(A)/a(9) would give the required 
accuracy. This suggests the following approximate sampling scheme: 

(a) Take a random sample of size NA and let ¢, be the estimate of from it. 

(6) Let No(t,) = v(t,)/a(t,). (1) 

(¢) Take a second random sample of size Max j0, (mo(t;) — NV) A]. (Lf (mo(t;) — NV) A is not 
an integer take the nearest integer to it.) Let t, be the estimate of @ from the second sample. 

(d) Put Nt, + (Mo(t,) — NV) ty 

No(t;) 


t, if No(t,) < N. 





if no(t,) >, 
t = 


In particular, if ¢’” is the sample mean, ¢ is the mean of the pooled sample. 

We shall say that the sampling scheme (a)—(d) is based on the estimate ¢™. 

That the estimate ¢ has the required properties in large samples scarcely needs formal 
proof. The proof is, however, outlined here in order to show the method of obtaining closer 
approximations to the exact theory. The precise statement to be proved is that ¢ has bias 
O(A-') and variance a(@) A~"[1 + O(A-!)}. 

Repeated application is made of the following property of conditional expectations (see, 
for example, Kolmogoroff (1950)). 


LemMaA. Let X, Y be random variables and let E(Y 
of Y given X. Then 





X) denote the conditional expectation 





R(X Y] = E[XE(Y | X))}. (3) 

To obtain the expectation of t we have 
E(t) = (1-7) {NE,(t, mo(t,)) + Ey[(1 -- Nmo(ty)) tel} + Felt), (4) 
where Mo(ty) = 1L/no(ty) = a(ty)/v(t,), (5) 


nm = prob(no(t,) < V) and #,, £, denote conditional expectations given that n9(t,)2.V. Now 
by (iii) E[no(t,)] ~ m9(9) and var [n,(t,)| = O(A~!). Thus if NV <n,(9) and the distribution of 
no(t,) dies away exponentially in the lower tail 7 = O(A~’) for all r.t If, in addition, mo(t,) 
is well behaved the possibility that ng(t,) < _N may be neglected in (4). In future, then, we 
assume that 

(iv) N <n (9) and the distribution of mg(t,) is such that the event no(t,) < N may be ignored. 
earned E(t) = NE{t, mo(t,)] + B[(l — Nmo(t,)) te). 


{ This is a much stronger result than is needed. 
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By the lemma the second term is E[{1 — Nio(t,)} L(t, | t;)] and E(t, | t,) = 0, since the second 
sample is independent of the first. Thus 


E(t) = NE[t, m(t,)] + 6— NOE[m(t,)). (6) 
By (iii) E[t,mo(t,)] = Amo(9) + O(A~), 
and E[m,(t,)] = mo() + O(A~?). 
Thus E(t) = 0+O(A-}). (7) 
Similarly, var (t) = a(@)A—[1 + O(A-})]. (8) 


Equations (7) and (8) show that t has the required properties when the preliminary sample 
size is large. Although for many purposes the above solution will be sufficiently accurate, 
it is worth while taking the analysis a step further in order to obtain closer approximations 
to the exact sampling theory. 

Consider the sampling procedure defined by steps (a)—-(d) with n(t,) replaced by n(t,) 


defined b 
F (ty). = no(ty) {1 + O(t,)/A}, (9) 
where b(¢,) is a function to be determined. Equation (6) now becomes 


E(t) = NE[t,mo(t,) {1 — b(t,)/A}] +4 


— NE{[m,(t,) {1 — b(t,)/A}] + O(A-?). (10) 
= . 1 @ 7] 
But by (iii) E(t, mo(t,)) = Om,(9) + Spe (Pmo(9)) Wo + OVA), 

and there are similar formulae for the other expectations in (10). After some reduction we get 
E(t) = 6+ m,(@) o(9) A“! + O(A-*). (11) 

t—m,(t)v(t)A! if N<n(t,). 
Put fe nope (12) 

ty if N>n(t,). 


Then (11) shows that ¢’ has bias O(A-?). 
We can derive var (t’) by an analogous method; the answer is 
var (t’) = a(9)A-! + A-*4.2m (8) m4(9) (A) v'(A) 
+ [mg(O) v(4)}? + 2m,(A) mo(A) v?(A) 
+ v2(0) min(@)/2N —a(0) b(A)} + O(A-). (13) 
We require /’ to have as nearly as possible a variance a(@)/A. Therefore choose 6(9) to make 
the second term in (13) vanish; i.e. put 
D(A) = no() v(A) {274(A) mo(A) y,(A) v-*(A) 
4 42(9) + 2m4(A) mo(A) + m6 (4)/2N }. (14) 
We can summarize the result as follows: 
(a) Take a preliminary sample of size NA and let t, be the estimate of 4 from it. 
(b) Let n(t,) = no(t,) {1 + b(t,)/A}, where ng(t,) and b(f,) are defined by (1) and (14). 
(c) Take a second sample of Max [0, (n(t,) — NY) A] and let t, be the estimate of @ from it. 
(d) Define t by equation (2) and then ¢’ by (12). Then under assumptions (i)—(iv) ¢’ has 
bias O(A-?) and variance a(#) A-![1 + O(A~?)]. 
Special cases of these formulae are discussed in the next section. 
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220 Estimation by double sampling 
2-2. Applications 


In developing the above formulae we considered a sequence of sampling schemes in which 
as the parameter A tends to infinity the required variance a(@)/A tends to zero and the 
preliminary sample size NA tends to infinity. In any particular application we have one 
particular variance function which must be small and one particular preliminary sample 
size which must be large. In discussing applications we can therefore set A equal to unity 
without loss of generality. 

Examole |. Estimation of normal mean 6 with given standard error a*. This is a trivial case 
when the population variance o? is known and the estimator is the sample mean. We have 
n (0) = oa-! and 6(#) = 0. Thus the total sample size is constant. The procedure is 
equivalent to taking a single sample of size the nearest integer to a?/a. This is the common- 
sense solution and is optimum (§4). 

Example 2. Estimation of normal mean @ with given fractional standard error a‘. The 
population variance is again supposed known. We have 


a(™) = a@?, n (9) = 07/(a6*?) and 6(0) = 8a+0?/(N6?). 
Thus the total sample size is 
n(t,) = 0%/(alt) + 802/i3 + 04/(Natt), (15) 


where /, is the mean of the preliminary sample. The final estimate is ¢’ = ¢(1 — 2a), where 
/ is the mean of the combined sample. The preliminary sample size N should be chosen as 
large as possible subject to NV <o?/(a6?), i.e. it is desirable to know an upper limit to | |. 
The procedure fails if 9 is very near to zero because the function m,(#) = o?/(a6*) has 
a singularity at / = 0, invalidating the expansions used in §2-1. In practice if @ is likely 
to be equal or near to zero we should modify a(4) so that a(0) is finite. This would be roughly 
equivalent to truncating the procedure (15). 

Example 3. Estimation of a binomial mean 6 with given fractional standard error a‘. If the 
procedure is based on the sample proportion defective, we have 


a(0) =al®, v(A)=OA(1-80), y,(0) = (1-26) [011 —9)]-4. 
The total sample size is 
n(ty) = (1—t,)/(at,) + 3[t,(1 —¢,)}-! + (a N2,)-, (16) 


and the final estimate is t’ = t—at(1—t)-!. We require N <(1 —@)/(a). The procedure fails 
if 9 is very small for the same reason as Example 2. (16) gives the double-sampling solution 
corresponding to Haldane’s (1945) inverse binomial sampling. 


3. ESTIMATION IN THE PRESENCE OF A NUISANCE PARAMETER 
3-1. Theory 


§2 gave the theory of double-sample estimation when there is one unknown parameter. 
A particular application is to the estimation of a normal mean when the population variance 
is known. We now develop the corresponding results for use when the population variance 
is unknown. Suppose that in addition to the unknown parameter 0, which is to be estimated 
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with variance a(@)/A, there is an unknown nuisance parameter ¢. Assume that in samples 
of any fixed size m, we can find estimates ¢ and f™ of @ and ¢ such that 

(i) &™ and f™ are unbiased estimates and have variance }/m and x¢?/m, where k 1s asymptotic- 
aliy constant; 

(ii) &” and f™ are uncorrelated and ™ has zero skewness and kurtosis; 

(iii) ef m is large, asymptotic means and standard errors can be developed for combinations of 
i, f™ and a(t™) by expansion in series. 

The following sampling procedure can be justified by an argument exactly analogous to 
that in §2-1. 

(a) Take a preliminary sample of size NA and let t,, f, be the estimates from it 

(b) Put n(ty,f,) = a(t) fr 2{1 +5(t,,f,)/A}, (17) 


[a"(Q)P , fa’(t) La 


where b(t, fi) = 2a"(th) + a(t,)  2Na(t,) N- 





(18) 


(c) Take a second sample of Max [0, (n(t,,/,) —_N) A] and let t, be the estimate of 6 from it. 





(d) Put t= ty if N>n(t,,f;) 
Nt, +(n(t,,f,)—N)tes . (19) 
; = : - if N<n(tfi), 
t if N ty 
and he ' >mtfi) | (20) 


t—a'(t)/A if N<ni(t,,f,).J 
Then assuming as before that 


(iv) the possibility that N > n(t,,f,) can be ignored, we can show that t’ has bias O(A-*) and 
variance a(@) A-1{1 + O(A-*)}. 


3-2. Applications 


As in §2-2 we set A = 1 in applications. 

Example 4. Estimation of a normal mean with given standard error a*, Base the method on 
the sample mean. Then ¢ is the unknown population variance o? and is estimated by the 
ordinary estimate of variance, i.e. we take f, = s?. Then x = 2, a(#) = a and 


2 
2) _# 


ntinh) (142) = A(r+2), en 


and the final estimate is the pooled sample mean ¢, which is easily shown to be exactly 
unbiased. The mean sample size is o7(1 + 2/N)/a. Thus the effect of not knowing @ is to 
increase the sample size by the ratio (1+ 2/N). 

Example 5. Estimation of difference between two normal means with given standard error a}. 
Let 0 be the difference between the means of two normal populations and o? be twice the 
separate population variances, assumed equal. Then (21) of Example 4 can be applied if we 
take f, to be the sum of the sample estimates of variance. First take a preliminary sample 
of size N from each population and let f, be the sum of the ordinary sample estimates of 
variance. Put 


ro 
n(tyf) = 23 (1+): (22) 
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and take a second sample of Max [0, n(t,,f,)—N] from each population. The difference 
between the pooled sample means is the required estimate. If the population variances are 
not equal, the procedure is still valid if the factor (1+ N-") in (22) is replaced by 


{1+ 2(1+7r?)(1+7r)-?N-}, 


where r = f{”/f{ is the ratio of the sample variances in the preliminary samples. However, 
when the population variances are different, the expected sample size is reduced by dividing 
the second sample between the two populations in a way depending on r?. This possibility 
is not analysed here. 

More complicated problems with several unknown parameters can be dealt with in 
a similar way. Detailed formulae will not be given here. Examples of problems requiring 
the more general analysis are: 

(i) To estimate the difference between, or the ratio of, two binomial proportions 6,, 4, 
with a variance same function of 0, and @,. 

(ii) To estimate the difference (9, — @,) between two normal means with a standard error 
a given percentage of 0). 

(iii) To estimate the difference between two normal means with unequal population 
variances with optimum subdivision of the observations between the two populations. 


4. EXPECTED SAMPLE SIZE 


The aim of the double-sampling schemes is to produce estimates with assigned accuracy 
from as few observations as possible. In this section we show that, except where the 
preliminary sample size is small, the best double-sampling procedure has an expected sample 
size only slightly greater than that for the best sequential procedure. 

We consider for simplicity the single-parameter problem of §2. From (9) the expected 


sample size is E[An(t,)] = Ano(O) + b(8) ng(9) + v(A) ng(A)/(2N) + 0(1). (23) 


To the first approximation the expected sample size is An,(#) = Av(A)/a(A). It follows 
immediately that the scheme should be based on the estimate ¢ with minimum variance, 
and that if instead an estimate of efficiency Z is used, the mean sample size is increased in 
the ratio Z-!. 

Now An,(Q) is the fixed sample size that gives an estimate with the assigned variance. It 
is reasonable to expect, at least when ¢” is a sufficient estimate for 7, that no sequential 
scheme can require fewer than An,(9) observations on the average. This can be proved 
formally as follows. Wolfowitz (1947) has generalized the Cramér-Rao inequality by 
showing that for any sequential estimation procedure giving an unbiased estimate 0* of the 
parameter @ of the distribution f(x, @), 

2)-1 
var (@*) > {ene (* oe) | . (24) 
where Fn is the expected sample size. Now for normal, Poisson and binomial distributions, 
the variance of the sample mean is v(@)/m where 


v(9) = z(° ae) 
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Thus considering estimates for which var (9*) < a(0) A—! we have for the above distributions 
En > Av(6)/a(A) = Ang(A). (25) 


Equation (25) is satisfied asymptotically for all distributions. 

To summarize, the mean sample size in the best double-sampling scheme is (23), and in 
the best sequential scheme it satisfies the inequality (25). 

Equation (24) depends on the assumption that the possibility that N >n(t,) may be 
ignored. If this assumption is not true the mean sample will be increased and it is 
likely that the resulting estimate ¢’ will be more accurate than is required (i.e. that 
(bias of ¢’)? + var (t’) <a(@)A~). N should be chosen as large as possible subject to the 
condition that prob (N > n(t,)) should be small. 


5. CONFIDENCE INTERVALS 


So far we have considered point estimation problems in which the variance of the estimate 
is required to be of given form. We now consider estimation by confidence intervals. There 
are two cases. First, we may want to give a confidence interval for 0 at the end of one 
of the sampling procedures already considered. Secondly, the problem may be stated in 
terms of confidence intervals, i.e. we may want to estimate 0 by a # %, confidence interval 
of predetermined form. 

We shail deal mainly with the first sort of problem. Suppose, in fact, that we have 
obtained an estimate ¢’ after a sampling procedure designed to give a variance a(4)/A. If we 
want a (100— 2a) % confidence interval for 0, we let g, be the solution of 


| a edt 
__- ” ss —d, 
V(2m) J — 0 


and then define /_, 6, by the equations 





(26) 
0,- Es) r.| 


(6_,4,} is then the required confidence interval. If an explicit solution is impossible, the 
equations (26) are solved by successive approximation; the method is due to Bartlett (1937). 

Example 6. Suppose that ¢’ is the estimate obtained after the procedure of Example 2 for 
estimating a normal mean with given fractional standard error a+. Then a(@) = a? and 
equations (26) give for, say, the 95 % confidence interval 


@ =t'(1+1-96a!)-!, 6, = t'(1—1-96a!)-1. 


The formulae (26) assume that ¢’ is normally distributed. A refinement of the method 
depends on evaluating the skewness y,, and kurtosis y, of t’ and making a correction for 
these based on the Cornish-Fisher expansion (Kendall, 1947). We shall not give general 
formulae here but will illustrate the method by a simple example. 

Example 7. Confidence interval of given breadth for a normal mean, variance unknown. 
Consider the construction of a confidence interval after the procedure of Example 4 for 
estimating a normal mean @ with given standard error a’. If t’ is the final estimate, in this 
case the sample mean, the (100 — 2) % confidence interval is, from (26), (t’-—g, at, t' +g, at). 
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Now it can be shown that for the distribution of t’, y, is zero and y, is 6N-!. Thus according 
to the Cornish-Fisher inversion of the Edgeworth expansion, the normal multiplier should 
be replaced by g, + (g2 — 3g,)/(4N) = g%, say. If we use the normal multiplier, the width of 
the confidence interval is 2g, a, and if we use the corrected multiplier, the width is 2g* a!. To 
solve Stein’s problem of arranging that the confidence interval is of width A, we take 
at = A/(2g,) or A/(2g%). The corresponding sample-size functions are from (21) 





4s ae 
at (1 +5) 9; (27) 
. 48° 2\ os 41 l+g9% 
Ase — Wo & - 28 
or in the second case AU + x) ge A? g2| 1+ oN (28) 


In Stein’s exact solution the corresponding sample size is 4s] A~*/3, ,_,, where f,, y_, is the 
two-sided 2a % point of the ¢ distribution with N — | degrees of freedom. 


Table 1. Estimation of normal mean by (100 — 2a) % confidence interval of exact 
and approximate solutions 





Preliminary sample size 10 Preliminary sample size 20 
= = A 
Y e. ‘ 

a Y%, Error of (27) % Error of (28) 9% Error of (27) % Error of (28) 
10 3-1 — 27 2-5 0-0 
5 — 3-4 — 45 0-0 —1-1 
2} —- 99 — 68 — 35 —1-7 
1 — 18-4 — 10-2 - 77 — 2-6 
} -— 24-6 — 13-2 — 10-8 —3-5 


From the exact solution we can compute the percentage error of the approximate 
formulae (27) and (28). This has been done in Table 1. Formula (27) has a fairly small error 
even when N is as small as ten, provided that a is not below 2} %. The correction for 
kurtosis makes a substantial improvement. These results suggest that the general approxi- 
mate formulae developed in §§2-1, 3-1 will be reasonably accurate except when N is 
very small. 

If at the end of one of the estimation procedures of §§ 2, 3 we require to test the hypothesis 
I, that 0 = 0, we proceed in an analogous way. If A, is true ¢’ is distributed with mean 
4, and variance a(,)/A. Hence if we assume ?¢’ to be normally distributed, the 2x % 


significance limits are 4)+g,a'(@))A-!. Corrections for skewness and kurtosis can be 
introduced if required. 


6. Discussion 


There are two main situations in which sequential methods are useful. In the first observa- 
tions only become available at infrequent intervals and must be interpreted as soon as they 
are obtained. An example is the study of accident rates which may only be obtainable at 
weekly, monthly, etc., intervals. Double-sampling procedures are useless in problems like 
this. The second type of situation is where the number of observations is under the experi- 
menter’s control, but observations are expensive, so that the smaller the sample size needed 
the better. Double sampling is often practicable in this sort of problem. 


~-—_—— 
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Examples occur in the routine testing of textiles. For instance, from time to time it is 
required to compare the mean fibre lengths of two lots A and B of wool. There are three 
possible purposes for such a comparison. First, we are very occasionally merely interested 
in deciding whether or not the two lengths are the same and are not concerned to estimate 
the magnitude of any difference that may exist. This is a pure problem of significance 
testing to which, for example, a Wald sequential test could be applied. Secondly, the 
problem is sometimes to estimate the difference 0 with given accuracy or given percentage 
accuracy. This is a pure problem of estimation for which, for example, a double-sampling 
procedure could be constructed. The third case is intermediate. Suppose that we want both 
to test whether @ is zero and also to estimate the magnitude of any difference that may exist. 
Further, it sometimes happens that while we want a sensitive test of the null hypothesis 
H, : 6 = 0, we are content with a comparatively inaccurate estimate when 6 is very different 
from zero. A Wald sequential test answers this protlem provided that the test is supple- 
mented by a rule for estimating 6 when sampling stops (see Cox (1952) for one such 
approximate rule). The resulting estimate will be very inaccurate if @ is very different from 
zero. Now we can restate the problem: to construct a scheme for testing Hy with given power 
near # = 0 and for estimating @ with given accuracy when @ is different from zero. (For 
example, if | @|>0 we might require our estimate to have given standard error ,/a’.) This 
suggests trying to solve this sort of problem by using a double-sampling scheme to estimate 
@ with variance a(#), where a(@) is a function of the general form shown in Fig. 1, a” and the 
shape of the curve being adjusted to give approximately the required power near @ = 0. 


» 
a(8)! 








0 9——> 


Fig. 1. Suggested form for a(@). 


A detailed study of tests of this type will not be given here, but the method will be 
illustrated by a simple example. 

Example 8. Let 6 be the unknown mean of a normal population of unit variance. Suppose 
that it is required 

(i) to test the hypothesis H,: 0 = 0 and to reject H, at the 5% level with probability 
0-975 when 0 = +}; 

(ii) to obtain an unbiased estimate of # such that when 0 is very different from zero the 
estimate has standard error 0-2. 

(When |4|>0, 25 observations are needed to give the required standard error while 
about 62 observations are needed for a fixed-sample-size test of the required power.) 
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Suppose that we take a(@) to be the inverted normal frequency curve 





_ies 
a(@) = 0-04— et, 29 
where is a constant to be chosen. As shown in §5 the estimate ¢’ will be significantly 
different from H, at the 5 % level if | ¢’ | > 1-96a*(0). If this is to be the lower 974 % point of 
the distribution of t’ when 0 = 4 we must have 


1-96a#(0) = 0-5— 1-96a4(}), (30) 


and this quadratic equation for uw gives 4 = 0-06313. Now that a(@) is fixed we can work 
out from (12), (14) and (23) the correction for bias a’(@), the sample size function n(@) and 
the expected sample size E,(n). 

These functions{ are given in Table 2, together with a(@) and n,(@). The final specification 
of the sampling scheme is: 

(a) take a preliminary sample of 25 and let ¢, be the sample mean; 

(6) take a second sample to make the total sample size n(t,); 

(c) let ¢ be the mean of the pooled sample and t’ = t—a’(t); 

(d) ¢’ is an unbiased estimate of 6 and is tested for significance from @ = 0 by referring 
t'a~*(0) = t’'/0-1217 to the normal tables; in particular, if | ¢’ | > 0-2385, ¢’ is significant at 5 %. 
n(@) is the sample-size function. —a’(@) is the correction for bias. H,(n) is the expected 


sample size. a*(@) is the standard error of the estimate. n,(@) is the crude first approximation 
to n(@). 


Table 2. Functions associated with the double sampling scheme 


0 n(0) a’(0) E,(n) ai(d) n9(9) 
0-0 73-2 0-000 72-0 0-122 67-5 
0-1 72-5 0-002 71-4 0-122 66-9 
0-2 70-5 0-005 69-6 0-124 65-3 
0-3 67-5 0-007 66-9 0-126 62-8 
0-4 63-7 0-009 63-4 0-129 59-7 
0-5 59-6 0-011 59-5 0-133 56-3 
0-6 55-3 0-013 55-4 0-138 52-7 
0-8 47-4 0-015 47-7 0-147 461 
1-0 40-8 0-015 41-1 0-157 40-4 
1-5 30-7 0-012 30-9 0-178 31-4 
2-0 26-6 0-007 26-7 0-191 27-3 
2-5 25-4 0-003 25-4 0-197 25-7 


The advantages and disadvantages of the present procedure are as follows. 

idvantages. As compared with a fixed sample size test there is an appreciable saving in 
observations if @ is very different from zero. We can give a test of H, at any required 
significance level and are not committed to a single level as in the Wald test. Once the 
preliminary computation of n(@) and a’(@) has been done, the test requires only one inter- 
mediate calculation. There is a definite upper bound (73) to the sample size. 

Disadvantages. There is a slight increase in the expected sample size when H, is true and 
only a trivial reduction when @ = 4; this compares with the substantial reduction in 
expected sample size in the Wald test. 


t The second-order corrections developed in §2 are small, so that it is likely that the scheme has the 
required properties to a high degree of accuracy. 
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To sum up, the Wald test and the sort of double-sampling test given here serve quite 
different purposes. The first is appropriate when we want to make an irrevocable decision, 
such as, for example, in deciding whether to accept or to reject a batch of articles. The 
second may be useful when a fairly accurate estimate of the unknown parameter must 
always be made after the test is complete. 


SUMMARY 


Double-sampling methods are developed for estimating an unknown parameter @ so that 
the variance of the estimate is some function of @ given in advance. Applications are made 
to the estimation of normal and binomial means with given standard error or given fractional 
standard error, and to the construction of a new sort of sequential test. 
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THE STATISTICAL SIGNIFICANCE OF ODD 
BITS OF INFORMATION 


By M. 8. BARTLETT 
University of Manchester 


1. PRINCIPLE 


It sometimes happens that we have a number of realized events 8S, =&,,& ,...,6,, and we 
wish to know how far their occurrence is compatible with a given theory or hypothesis H. 
It is of course necessary to consider the ‘likelihood’ of the events, p,,(@1,&, ...,.€, | H), or 
equivalently its logarithm L,,=log,»,, or equivalently the ‘information’ (as defined by 
I. J. Good, 1950, §6-9), i,, = —log, »,,. We know further that the support given by the data 
to any theory is necessarily relative, or, in other words, it is the likelihood ratio p,,/p;, which 
is more fundamental, where p;, denotes the probability of S, on an alternative hypothesis H’. 

The quantity i,, is related to the ‘average information’ E(i,,) = J, introduced by C. Shannon 
(1948) and others in connexion with theories of communication, but the above remarks 
indicate (see also my contributed paper to the Symposium on Information Theory (1950)) 
that i, or J, is insufficient for purposes of statistical inference rather than of statistical 
specification. However, in many situations where the alternatives are numerous or ill- 
defined a more empirical approach has been used, of which the original y? concept introduced 
by Karl Pearson is a familiar example. Here the likelihood p,, is effectively used in com- 
parison with its own sampling fluctuations on a single ‘null hypothesis’, the argument being 
that only if it appears abnormally small (or large) is there any strong reason to suspect the 
hypothesis as inadequate.* As a rather different exwmple might be instanced the com- 
bination of probabilities P corresponding to n independent tests of significance, proposed 
by R. A. Fisher (1932, §21-1); here — 2log, P for each test is distributed as a y? with two 
degrees of freedom, so that the sum of the — 2 log, P for all the tests is distributed as y? with 
2n degrees of freedom. However, in this example the product of the probabilities P is not 
a true likelihood in the sense used above, for the possible probabilities for each test do not 
define a probability distribution (in the sense of referring to mutually exclusive events): 
the method is cited because of its similar summation of — log, P. 

It is suggested that the likelihood function may sometimes be tested even for more 
miscellaneous occurrences, the significance of which we wish to appraise. The suggested 
test may be stated thus: we may compute 1,, and compare its value with its expectation I, in 
relation to its theoretical sampling fluctuation. In the general case of dependent events, the 
sampling fluctuation may not always be easy to evaluate,t though special cases such as 
long probability chains have been dealt with (Bartlett, 1950). In the case of independent, 
or effectively independent, events ¢,,6,,...,é,, the expectation and variance (and higher 
moments if necessary) of i,, are, however, not difficult to evaluate arithmetically, especially 
if a table of appropriate quantities is available: a useful approximate test thus becomes 
available. 

* The more direct use of the likeliliood p, (or any equivalent statistic) in the case of small frequencies 
has been proposed by more than one writer for various applications, e.g. by 8. 8. Wilks (1935), R. A. 
Fisher (1950) and R. Frisch (1952), the last-named being especially concerned with the case where the 


number of classes is comparable with the number of observations. See also the further comment in 
$3 (c) regarding the logical basis of the x? test. Tt See, however, § 5. 
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2. FoRMULAE 
For independent events, we have 


i, = X,(—log, p) = %,,(¢), (1) 
say, summed over the n events; further, 
I, = E(i,) = %,,(1), (2) 
where J is the average information H(i), and 
o*(I,) = X,(97); (3) 


where o? is the variance of 1. Moreover, let any event & be merely one of a complete set of 
mutually exclusive events ¢, 6’, &”, etc., with probabilities p, p’, p”, etc. Then 


T=pitp''+p't"+..., (4) 
o? = p(t)? + p(t’)? + p"(t")? +... —L?. (5) 
In the particular but important case where p’ = 1 — p, we have 
[= —(plog,p +qlog.q), (6) 
o% = pq(log, p/q)*. (7) 


Since all moments of 7 are finite, it is reasonable to suppose that 7,, will be approximately 
normally distributed about J, with standard deviation o for n not too small. As i,, is 
essentially positive, we might at first sight expect an even better approximation to be 
given by x?~Ci,, with f degrees of freedom, where C and f are adjusted to agree with the 
theoretical mean J, and variance o?, i.e. C = 2/,,/o7, f = CI, but for reasons given in §3 the 
normal approximation is recommended. To facilitate the computation cf the above 
quantities, Table 1 has been constructed giving 1 = — log ,.p against p for p = 0-50 to 1-00. 
(Logarithms to another base could alternatively have been used.) Two situations may then 
be distinguished: 

(i) & relates to a simple dichotomy, so that J and o? are given by formulae (6) and (7). 
These are then read off from the table. Their values for the alternative &’ with probability 
q = 1—p may of course be read off from the same entries. 

(ii) & belongs to a set &, &’, &”,.... The formulae (4) and (5) must then be worked out, 
but the values of —plog, », —p log? p given in the table will be.helpful. (For p< 0-50, we 
use —qlog,g, —qlog?gq for q = 0-50 to 1-00.) 

For small probabilities p, the table may not be sufficiently detailed, but such results as 
—log, p = 2:3026 —log, 10p, recalled at the foot of the table, will probably be sufficient 
for most cases. 

3. COMMENTS 


One or two remarks by way of further interpretation or warning are appended. Two 
separate kinds of facts are expected to provide no light on the theory or model under test: 

(a) With the first kind an event & is certain to occur on the theory. In this case J and 
o? are both zero, and this event makes no contribution. This prevents, as it should, the 
inclusion of events which are certain from contributing to the total score. Of course, if this 
‘certain’ event & did not after all occur, we should score —log,q = +00, and our theory 
would certainly be rejected. This is the classical critical experiment. 

(6) In the second kind of distribution, the events.é, &’, &” have equal probabilities. In 
this case it is immaterial to our theory which of them occurs; this is paralleled by the 
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Table 1 
I o? 
p | —log,p | (equation | (equation | —p log, p| plog?p | qlogtg | —q log.g | —log.q| 9 
(6)) (7)) 
0-50 | 0-6931 0-6931 0-0000 0-3466 | 0-2402 | 0-2402 | 0-3466 | 0-6931 | 0-50 
0-51 -6733 -6929 -0004 +3434 -2312 | -2493 +3495 -7133 | 0-49 
0-52 6539 -6923 -0016 -3400 2224 | +2586 +3523 -7340 | 0-48 
0-53 6349 -6913 -0036 +3365 2136 | +2679 +3549 7550 | 0°47 
0-54 6162 -6899 0064 +3327 -2050 | +2774 +3572 7765 | 0-46 
0-55 | 05978 0-6881 0-0100 0-3288 | 0-1966 | 0-2869 | 0-3593 | 0-7985 | 0-45 
0-56 +5798 -6859 -0143 +3247 -1883 | +2966 -3612 +8210 | 0-44 
0-57 5621 -6833 0195 +3204 -1801 | -3063 -3629 +8440 | 0-43 
0-58 +5447 -6803 -0254 “3159 ‘1721 | -3161 +3644 +8675 | 0-42 
0-59 +5276 -6769 -0320 3113 1643 | -3259 -3656 -8916 | 0-41 
0-60 | 0-5108 0-6730 0-0395 0-3065 | 0-1566 | 0-3358 | 0-3665 | 0-9163 | 0-40 
0-61 -4943 -6688 -0476 3015 1490 | 3458 -3672 -9416 | 0-39 
0-62 -4780 -6641 0565 -2964 +1417 | +3558 -3677 -9676 | 0-38 
0-63 -4620 -6590 -0660 -2911 +1345 | +3658 -3679 -9943 | 0-37 
0-64 -4463 6534 -0763 +2856 +1275 | -3758 -3678 | 1-0217 | 0-36 
0-65 | 0-4308 06474 0-0872 0-2800 | 0-1206 | 0-3857 | 03674 | 1-0498 | 0-35 
0-66 “4155 -6410 -0987 +2742 “1140 | +3957 -3668 | 1-0788 | 0-34 
0-67 “4005 6342 “1109 +2683 1075 | -4056 +3659 | 1-1087 | 0:33 
0-68 +3857 -6269 +1236 +2623 “1011 | 4155 3646 | 1-1394 | 0-32 
0-69 3711 6191 +1369 +2560 0950 | -4252 -3631 | 1:1712 | 0-31 
0-70 | 00-3567 06109 0-1508 0-2497 | 0-0891 | 04349 | 03612 | 1-2040 | 0-30 
0-71 +3425 -6022 “1651 +2432 0833 | -4444 +3590 | 1-2379 | 0-29 
0-72 +3285 -5930 -1798 +2365 0777 | +4537 -3564 | 1-2730 | 0-28 
0-73 +3147 -5833 -1950 +2297 0723 | -4629 -3535 | 13093 | 0-27 
0-74 3011 “5731 “2105 -2228 0671 | -4718 3502 | 13471 | 0-26 
0-75 | 0-2877 0-5623 0-2263 0-2158 | 0-0621 | 0-4805 | 0-3466 | 1-3863 | 0-25 
0-76 +2744 “5511 +2423 +2086 0572 | -4888 +3425 «| 1-4271 | 0-24 
0-77 2614 -5393 +2586 -2013 0526 | -4968 +3380 | 1-4697 | 0-23 
0-78 +2485 “5269 +2749 -1938 0482 | -5044 3331 | 1-5141 | 0-22 
0-79 +2357 “5140 -2912 +1862 0439 | “5115 3277. | 1-5606 | 0-21 
0-80 | 0-2231 0-5004 03075 0-1785 | 0-0398 | 0-5181 | 0-3219 | 1-6094 | 0-20 
0-81 “2107 “4862 +3236 +1707 -0360 | +5240 3155 | 1-6607 | 0-19 
0-82 “1985 “4714 +3394 +1627 -0323 | +5293 +3087 | 17148 | 0-18 
0-83 -1863 “4559 +3548 +1547 0288 | +5338 3012 | 1-7720 | 0-17 
0-84 1744 +4397 -3696 1465 0255 | -5373 -2932 | 1-8326 | 0-16 
85 | 01625 0-4227 03836 0-1381 | 0-0225 | 0-5399 | 0-2846 | 1-8971 | 0-15 
0-86 +1508 “4050 -3968 “1297 0196 | -5412 2753 | 1-9661 | 0-14 
0-87 +1393 “3864 -4087 +1212 0169 | °5411 2652 | 2-0402 | 0-13 
0-88 +1278 +3669 -4192 “1125 0144 | -5395 2544 | 2-1203 | 0-12 
0-89 “1165 3465 -4279 -1037 0121 5359 -2428 | 2-2073 | O-11 
0-90 | 01054 0-3251 0-4345 0:0948 | 0-0100 | 0-5302 | 0-2303 | 2-3026 | 0-10 
0-91 -0943 +3025 -4384 -0858 0081 | +5218 2167. | 2-4079 | 0-09* 
0-92 0834 “2788 -4390 -0767 0064 | +5103 -2021 | 2°5257 | 0-08* 
0-93 -0726 +2536 -4356 0675 0049 | +4950 1861 | 26593 | 0-07* 
0-94 0619 +2270 -4270 -0582 0036 | +4749 “1688 | 28134 | 0-06* 
0-95 | 0-0513 01985 0-4118 0-0487 | 0-0025 | 0-4487 | 0-1498 | 2-9957 | 0-05* 
0-96 -0408 “1679 “3878 0392 0016 | +4144 -1288 | 3-2189 | 0-04* 
0-97 0305 +1347 +3516 0295 -0009 | +3689 1052 | 35066 | 0-03* 
0-98 -0202 -0980 -2969 0198 0004 | -3061 0782 | 3-9120 | 0-02* 
0-99 0101 -0560 -2090 -0099 0001 | +2121 0461 | 4-6052 | 0-01* 
1-00 | 0 0 1 0 0 0 0 0 x 0-00 
4 



































* —log, 0-099 = —log, 0-99 + 2-3026, — log, 0-0099 = — log, 0-99 + 4-6052, etc. 
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relation i—J = 0, so that no excess or deficit score is registered. It will be noticed that if 
the x? approximation were used, such situations would result in a piling up of J,, without 
a corresponding increase in the total variance; this is why the normal approximation seems 
preferable. (As the corrected third moment j, of i,, can also be evaluated as well as its 
variance o?, the more appropriate x? approximation of the form x?~ Di,,—F is always 
available if required. We should take D = 40?/y,, f = 4D®o?, and F = DI, -f.) 

In this situation with equal probabilities more light on our theory would clearly be 
available if we were comparing it with a rival one for which the probabilities became 
unequal, but it is again emphasized that we are excluding such further considerations. 

A word of warning also seems advisable about the nature of the events &, &’, &”,.... 
These must be well-defined events, and not subjected to subsequent artificial manipulations. 
For example, suppose we had two events, & with probability p = %, and &’ with probability 4. 
If & occurs, this is more favourable than if &’ occurs; we must not subsequently divide 
& into two subevents e, and e,, each with probability 4, observe that, of the three events 
¢,,€, and &’, e,, say, occurred, but as the probabilities are now equal conclude that no 
evidence is available. Similarly, if ¢’ occurred this would contribute negatively to 7, —J,,, 
but we could annihilate this deficit by the same device. The danger of such manipulations 
will be illustrated again in the numerical example treated in the next section; while the 
empirical element present in the test renders it somewhat liable to difficulties of this kind, 
a proper care in the definition of the events should prevent any trouble. 

(c) Asa result of correspondence with Prof. E. 8S. Pearson, some further words of warning 
are appended to this section in relation to the empirical nature of the proposed test and 
possible weaknesses it may consequently have. It should be remembered in the standard 
x? test that while no specific alternative hypothesis is available, the possible hypotheses 
are usually all regarded as independent repetitions of the same probability set, the 
probabilities of which can alternatively be estimated by maximum likelihood. This use of 
log p,, — log Pmax., Where Pax. denotes such a maximum likelihood, leads to an asymptotic 
quadratic sum of deviations of theoretical and estimated probabilities, increasing with any 
discrepancies. With the present test, for which it is assumed that no such maximization 
is possible owing to the unsystematic and individual character of the various probabilities, 
the only other natural standard which seemed available was the average of log p,,, but if we 
consider the power of the resulting test against particular alternatives, the fact that more 
then one hypothesis may give the same average of log p,,, particularly if each probability 
set refers to more than two contingencies, means that not all alternative hypotheses can be 
excluded even in relatively large samples. 

If for comparison we consider for a moment the likelihood ratio (logarithm) log p,, — log p;, 
when a specific alternative hypothesis H’ is available, it is possible to obtain the order of 
magnitude of the power of this ideal test by approximately identifying a quantity like log p,, 
with its appropriate average. This readily gives on the null hypothesis 


E{log p,, —log p,,} ~ $2, X(Ap)?/p (3) 


for small Ap (p finite), where the inside summation sign relates to the probabilities in any 
set and Ap the change in any p on the two hypotheses (note that ZAp = 0). Unfortunately, 
such a result does not imply that E’{log p,} is necessarily different from E{log p,}, where 
E’ {log p,,} denotes the average of log p,, on some different hypothesis, for E’{log p,,} is a quite 
distinct theoretical quantity from E{log p;}. 
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This emphasizes the need for some caution in the use of the proposed test, and its use 
would not be recommended for any systematic situations where more standard tests are 
available. Even in other cases it may sometimes be advisable to supplement it with, or 
replace it by, others; for example, with the use of the likelihood ratio test against an 
alternative of particular interest. Thus we might be interested in (i) a comparison with the 
alternative in which the confingencies of each set are randomly permuted, so that each 
contingency is equally probable, or (ii), in the case of a simple dichotomy for each set, so that 


X(Ap)?/p = (Ap)?/(pq) (q = 1—p), (9) 
finding the constant value of Ap/,/(pq) which would be significant in the n sets. 


4. EXAMPLE 


In connexion with some investigations into the statistical mechanism of epidemics, a mock 
epidemic series was constructed representing measles incidence in a boarding school (see 
Bartlett, 1952). The exact mathematical theory for such a model remains unsolved, but an 
approximate method of investigating the incidence in time of major epidemics is to treat 
the number of susceptibles as a quasi-constant, since its variation during the time-interval 
critical for epidemic incidence seems unimportant. In the particular model in question, 
epidemics could not begin unless new infection entered the school at the beginning of the 
term; a method of testing the validity of the approximate theory was therefore to evaluate 
by means of it the probability of an epidemic beginning at such times (this depending on the 
exact number of susceptibles present, and infectives entering, at the time), and compare 
with the actual realizations. Here of course the future behaviour of the series depended on 
its previous history, but when the instantaneous conditions have been specified the stochastic 
outcome is otherwise independent of previous results, so that we have effectively the case 
of independent events. (This interpretation strictly implies an unorthodox sampling basis 
which is discussed more fully in §5.) 

It is an interesting and essential feature of the stochastic epidemiological theory that an 
epidemic cannot begin until the number of susceptibles has passed its threshold value, the 
probability of ‘extinction’ of infection being otherwise unity. Under such conditions the 
fading out of infection often does not provide much of a check, and it was decided to 
subdivide all cases of fading out of infection into ‘immediate extinction’, defined by the 
non-occurrence of any new cases, and ‘minor outbreaks’, in which a few new cases arose 
before complete extinction resulted. The evaluation of the probabilities on the approximate 
theory will not be given here in detail; it is sufficient to note that it follows from known 
formulae in the theory of the simple birth-and-death stochastic process. The resulting 
probabilities are listed in Table 2, and are all that we require to know for the application of 
the test proposed in this paper. 

Before applying this test, we might note that the realized events corresponding to any 
one of the three classifications are necessarily a selected set, so that nothing can be learnt 
from such a set considered alone. On the further point referred to at the end of §3, notice 
that the ‘three classifications’ probability distribution has been evaluated throughout, 
whatever event occurred ; this avoids any risk that we are choosing our distribution according 
to the realizations. 

As the distribution is more than a simple dichotomy, it is necessary to work out the 
contribution to the variance from each event, but this is quite straightforward. The 
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were used. It was found, making use of formulae (4) and. (5), that 


It is quite evident that 7,, is entirely in accordance with its expectation, and the approximate 
theory thus an adequate representation of the more exact theory appropriate to the realized 


I,= 


= 


—X(L) 
x(07) 


= 6-36, 
= 0-503, 


o = /0-503 = 0-71. 
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calculations were made before Table 1 had been constructed, and logarithms to the base 10 


events in the artificial series, in so far as the features considered above are concerned. 


Table 2* 





(1) Immediate extinction 


(2) Minor outbreaks 


(3) Major epidemics 








(1) (2) (3) (1) (2) (3) (1) (2) (3) 
0-50 0-50 0 0°57 0-43 0 0-00 0-03 0-97 
0-40 0-28 0-32 0-39 0-24 0-37 0-15 0-26 0-59 
0-14 0-24 0-62 0-56 0-44 0 0-34 0-18 0-48 
0-47 0-41 0-12 0-61 0-39 0 0-37 0-20 0-43 
0-37 0-21 0-42 =~ - =~ 0-04 0-03 0-93 
0-34 0-17 0-49 _ — — 0-39 0-26 0-35 
0-74 0:26 0 _ — _ — — — 
0-51 0-49 0 — — _ — _ — 









































* The data are classified according to the category into which the realized event fell. Probabilities 
of reaiized events are in heavy type. 


As a matter of interest, the test was reworked for the simple dichotomy into major 
epidemics or ‘extinction’, events of type (1) and (2) not being distinguished. The cases of 
certain extinction now provide no information (none of them being contradicted), and may 
be excluded. The scores are now 


i, = 2:80, J, =3-00, o = 0-58, 


again giving entirely satisfactory agreement. 

To illustrate the use of the likelihood ratio test against the alternative of a random 
permutation of the probabilities in each set, we note that for the full three-contingency sets 
the log ratio is —6-38+ 18log,)3 = 2-21, passing odds of 100: 1 on the Wald-Barnard 
sequential limits (see Barnard, 1949, §9). While this further test is not considered of 
particular relevance in this example, its simple character (even for the case of dependent 
observations, discussed more fully below in relation to the empirical test) is worth 
remembering. 

5. DEPENDENT OBSERVATIONS 
It was suggested in the example in §4 that we might treat the events considered there as 
effectively independent provided that at each stage the stochastic outcome was based on 
the events that had materialized up to that stage. This implies a ‘temporal evolution’ of the 
probability specification defining the sampling basis of the test, so that as an event occurs 
it is considered given as far as future events are concerned. As this is different from the 
orthodox sampling basis, in which the whole sample of data is taker. to vary en bloc from one 
realization to another, it is important to examine its validity rather carefully. When 
Biometrika 39 16 
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permissible, the procedure (which might be christened ‘temporary sampling’) can be very 
useful, for it provides a means whereby dependent events whose sequence is defined by 
a temporal order can be handled as simply as independent events. 

It will be sufficient to examine the case of two events or samples &, and &. We shall now 
for convenience use the notation 


L=log p(é,, &2) = log p(é,) + log p(é,| &,)=L,+ Ly, (10) 


where the dash indicates ‘conditional on &, given’. It is in the averaging of L that the choice 
of a sampling basis enters. We define 


@’ = E(L,)+ E'(Li), (11) 
in contrast with G=E(G’) = E(L,)+ E(L)), (12) 


where in (11) Z’ denotes averaging for given 6,. We are at liberty to consider whichever of 
G and G’ seems the more relevant, and the ‘temporary sampling’ basis proposed corresponds 
to the use of G’. Note as a consequence of this approach that an entirely different set of later 
eventualities that might have resulted if the first events had materialized differently is 
considered irrelevant and ignored. 

The ‘variance’ of L is now defined on this basis as the mean square deviation 


E(L-—G@’) = E(L,—G,)? + E(L,— G2)? (13) 


the contribution from the product (Z,—G,)(Z,—G}) obviously vanishing from averaging 
first over &, for given &,. In this respect formula (13) appears in general simpler than that 
for the variance of L in the ordinary sense, viz. H(Z —@)?. So far the argument is exact, and 
it is in the next step, if taken, that an approximation is introduced. We replace the right- 


hand side of (13) by E(L,-G,)?+B'(Li- G2, (14) 


i.e. we do not complete the averaging over @, in the second term. Apart from trivial cases 
where (13) and (14) are identical (as may happen if the dependence can be transformed 
away by a linear transformation), at least three cases may be cited where this approxima- 
tion, which is introduced only in our determination of the standard deviation of the sampling 
fluctuation of L and not in the measure L —G’ of the sampling fluctuation itself, will be 
adequate. 

(i) When &, is only weakly dependent on é,. 

(ii) When @, itself represents an entire sample of some magnitude, for which ‘large- 
sample’ properties hold, and averaging over different &, does not therefore appreciably 
affect the result. 

(iii) When the above formulae are extended to an entire sequence, which is stationary 
and ‘ergodic’, i.e. the different types of event tend to repeat each other in time, and the 
average over time effectively replaces averaging in probability. 

It is the last case which explains the occurrence of asymptotic multinomial distributions 
whose probabilities are the sets of transition probabilities of a Markoff chain and whose 
numbers of ‘independent’ trials are the observed frequencies for the different states 
(cf. Bartlett, 1950). It is the last case again which justifies the procedure adopted in §4 for 
the numerical example, where the total variance was calculated on the basis of the approxi- 
mation (14) rather than the exact basis (13). Here, however, (ii) is also invoked in one 
sense, for the sequence analysed was only long enough to appeal to (iii) if the detailed states, 
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depending on precise numbers of susceptible and infected individuals, are treated as 
typical states in the quasi-periodic unfolding of the sequence of events, which extended 
over several complete ‘cycles’. 

It is instructive (though not essential to the rest of this paper) to consider the relation of 
the above method, and further approximation, to the methods appropriate in the case of 
ostimation problems. This is done in the following and concluding §6. 


6. NOTES ON ESTIMATION, INCLUDING D. G. KENDALL’S PROBLEM 


If for the situation (10) we alternatively require to estimate an unknown parameter 0 we 
naturally consider the maximum-likelihood equation 


OL OL, aL, 
7 wt oe = °- (15) 


In the derivative 0L/00 (under the usual differentiation conditions), it will be noticed that 
the distinction in the last section between G and G’ disappears, for 0'/00=0G'/00=0. 
Hence only one ‘variance’ for ¢L/cé arises, corresponding to (13), and giving the ‘informa- 
tion’ on @ in R. A. Fisher’s sense. If we write this equation as 


I =1,+E(1;), (16) 
then the approximation (14) would correspond to the approximation 


This approximation might be used in a problem for which case (ii) (and/or case (i)) of the 
previous section applied, for example, in a genetics experiment, say, where the progeny of 
certain parents provided information on a parameter 0, and these parents were themselves 
some or all of the progeny of previous parents. It might not be worth while (and even 
impracticable if all the progeny of the first generation had not been used in the subsequent 
experiment) to evaluate (16): (17) would therefore be used instead. Of course, as measures 
of the sampling variance of our estimate of # both expressions are only approximations, and 
the completely unaveraged quantity 
eL OL, eL, 

a si 
might well be used instead of either. 

The somewhat peculiar estimation problems to which dependence can give rise is well 
instanced by the following problem once put to me by D. G. Kendall in a letter (cf. also the 
discussion by Girshick, Mosteller & Savage, 1946): m coins are tossed and the number R of 
heads noted. A number R of coins is then tossed, and the further number S of heads noted. 
How should we estimate the unknown probability p of obtaining heads? 


Treating R as the first event &, in the notation of §5, and S the second event & we 
readily find that 


oL 1 1 
— =—(R—np)+ —(S—Rp). 19 
ap a P+ 5! P) (19) 
The maximum-likelihood estimate is thus the ‘obvious’ estimate 
, Ree 
"a8 (20) 
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but at the same time the distribution of p is not the standard one for the estimate of the 
probability of a binomial distribution, because of the manner in which the size of the second 
sample was determined. In particular, f willtend to be negatively biased, because thesmaller 
it is in the first sample, the less chance of this discrepancy being righted in the combined 
sample. It is not difficult to investigate the sampling properties of p, by considering 
averages first for S, given R, and then for R. Thus 


ee ol l 


a 1 
—_— ai + (zs): 


(21) 


so that for reasonable-size n the maximum-likelihood estimate is still correctly providing 
the optimum asymptotic variance 1//, with a bias which can be approximately corrected. 
However, the question remains: what is the correct small-sample theory? 

It is suggested that the answer (consistently with Fisher’s statistical estimation theory) 
is to treat directly the ‘theoretical sufficient statistic’ given by the expression 7'(p) =0L/dp 
in (19). We have exactly £(7') = 0, o7(7’) =, and, moreover, in principle know the 
exact distribution of 

pqT ( p)=Rq+S—np (22) 
for any p, so that confidence statements for p can be set up. As an approximation for the 
latter we should use normal theory to give a P = 0-05 confidence interval for p from the 


a aneeni T(p) + 1-96 y{n(1 +p)/(pq)} = 0, (23) 


or, of course, as a cruder approximation use the estimate p. 

It is worth remembering the value of the direct use of the linear quantity 0L/00 in other 
estimation problems associated with sequences of events. Thus for a stationary Markoff 
process in a normal variable X observed at times ¢ = 0,1,...,n, governed by the auto- 


regressive equation Xnia— BX, = Foss, ata 


where Y,,, is an uncorrelated sequence with zero mean, we have (ignoring for simplicity 
the first term in L involving X, alone) 


OL n 
ap sted T (fp) = >» (X,—£X, 1) X,-1 (25) 
y r=1 


Wehave exactly a?(7) = no}, 0%} = no*(1 — f?), and, in spite of the further approximations 
introduced in the estimation of o7(7'), would anticipate a reasonably accurate confidence 
interval for # from the equation 


T() +1-96'S X2{(1—f)/n} = 0. (26) 
r=0 


I am indebted to Mrs A. Linnert for the preparation of Table 1. 
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SAMPLING FROM BIVARIATE NON-NORMAL UNIVERSES 
BY MEANS OF COMPOUND NORMAL DISTRIBUTIONS 


By HANNES HYRENIUS 
University of Gothenburg 


1. Among recent investigations of the non-normality effect upon the sampling dis- 
tribution of certain statistics, a number of studies have been based on the assumption that 
the parent population can be described by the first terms of a Gram-Charlier’s A-series 
(see, for example, Gayen, 1950, 1951; Geary, 1947; Quensel, 1938). As an alternative method, 
use has been made of a compound normal frequency distribution (Baker, 1931; Hyrenius, 
1949, 1950). 

Both these types of studies may be called ‘sampling from a common non-normal universe’. 
A basically different approach consists in drawing the various items of the sample from 
different universes, or what may be termed ‘sampling from individual universes’ (normal 
or non-normal). The idea of individual normal universes has been recently used in a number 
of studies, for example, by Quensel (1944, 1947), Robbins (1948) and Weibull (1950, 1951). 

The following article is an extension to the bivariate case of the studies of sampling from 
compound normal distributions. 

After presenting the normal theory in a form suitable for the present purpose, the 
outline of sampling from a general bivariate compound normal distribution is given in 
§3. In §4, the distribution of the means, variances and covariance of the sample are 
derived for the case of varying component media. §5 deals with the distribution of the 
correlation coefficient, and §6 with that of the regression coefficient. 

The author expects to present some further generalizations in a subsequent paper, 
among them a study of the case of varying the variances and covariances of the com- 
ponents. It is also his intention to discuss some numerical applications of the theory in order 
to illustrate the practical effect of non-normality. 


2. This section gives a summary of the normal sampling theory in a form suitable for the 
subsequent generalizations to compound normal frequency distributions. 
The general form of a bivariate normal frequency function (fr.f.) may be written 


P(x, Xe] id P[2, Lp; Hy, Xe, oi, 73, Pp] 








- 1 1 (ay —O%)® | (@tg—O%)® (ty — 0) (eg — Xp) 7 
c aed? |~sa—AL “a si a si O10 |} wits 


We first give the characteristic function (c.f.) of the first- and second-order moments 
around the means of the parent population, i.e. of 


Z, = 22,/N, Z = Xa,/N, my = Ua?/N, my. = Ua3/N and my), = X2,2,/N. 
Denoting the characteristic variables t,o, to;, te, tog and t,, respectively, and introducing 
th tati 
e notations Bt = o%(1—p*), F2 = 03(1—p?) (2-2) 


the c.f. can be written U =Q-WerP, (2-3) 
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where Q= all ~ 7 ta | [1 V Fea |- [ p+ 28 tu] (2-4) 














and 
** rarer (a me kel - ras +[ Beg +t] [Ft 
+2] 31 pS egy vf + Ft | [ + 27? |} at a3 — 2p S10 | (2-5) 
a; Py at a3 o10, 
We may also write 
Q=1 mon, ee io t+ a *) [tan tog — 2%). (2-4a) 


In particular, if «, = a, = 0 we have 


y , 20? 262 C.Ge -4iN 
U=[I1 — ps] (1-2 2) (1-5 tue) - (04778) | 
gio 3 7s, ( 263 73 (, 20%, \ 20:7. T1F2, \}\ 
* XP 1571 — py 2Q + fol | — w f02) +H toa( 1 W £20) + yr Sof (P+ | 


The frequency function of %,, Z,, m9, mg. and m}, is obtained by the Inversion Theorem. 
Applying this to (2-3), it is found (see, for example, Quensel, 1938) that the second-order 
moments always appear in the combinations mj )—7?, mj.—%2 and m},—%,%,. Thus, 
integration brings out the fr.f. of the sample means Z, and %, and the second-order central 
moments Moo, Mog aNd M4). 

It appears that the joint fr.f. is the product of the two distributions of the means and of the 
second-order central moments, which consequently are independent of each other. The 
sample means are normally distributed according to the well-known formula 


Be eye o? o2 
P[%,,%_] = 9} %,,Z,; Oy, Oy sS, NV 3 |. (2-7) 
For the second-order central moments, the c.f. is 


U[t29, toe, tu] = QU», (2-8) 


where Q is given by (2-4) with the modification that the characteristic variables to, tp, and 
t,, here refer to the central moments instead of the moments around zero. 
The corresponding fr.f. is 





#N—1) 
a = — m2. HN-® 
Y[Moo, Moe, My] 4nT(N a 2) [= o3(1 Al [M29 Moe mi) 


= Moz _ 9, Mi 2- 
x exp |— ie x1 —p') ott of spa I. (2-9) 


3. Letting the subscript ‘i’ indicate a bivariate normal distribution with specific para- 
meter values, the bivariate compound normal fr.f. is given by 


f[%, 22] = y Pi Gilt %) (3-1) 
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Its first moments around zero are 

Pyo = UPiXy¢, Mor = VPiXai- } 
Hoo = Lp (C+ 2%), Mi =UplP:Cru Ft %i%ei)» Hoe = UPi(O}: + %s).- 
Mg = UP (3% OF; + O9;), May = Vpy( Weg Pp Og Taj + La Thy + LF; % 4), 
Mya = Xp; (egg Pj, Fi Fp + Hye TH, + %yeA3i), Mos = UP; (Fo%g¢ 73, + O95). 
Hig = Up;(304,; + 6a}, 07; + a4;), (3-2) 
Hay = XP (BP, OF; Oa4 + 34; yi OF; + BOG; Pj O44 a5 + hy Map) 
Mog = Lp LOG, OF (1 + 2p}) + 4004; 0g: Pp O45 O95 + OH}; + OF, OF; + OH, 03,), 
Mig = Up (3p, 04, 09; + 304; Hyg OF; + 303, Pj O45 Fag + Oy 94), 
Hog = Up;(302; + 6x3; 03; + 04,). 





The c.f. of the first- and second-order moments when sampling from (3-1) is given by 
= {Zp,Q; tePip, (3-3) 


where Q; is given by (2-4) and P; by (2-5), if subscripts ‘i’ are added to the parameters 
a, o and p. 


It follows that ! 





U= be Ais tl [pi] i Ub, (3-4) 
(y) (v) 
Bh rd [ey 


where vy stands for one separate combination among all possible combinations of n out 
of N. By writing 


N! ¥ 
got WV pee ty 4 IT [pi ey, (3-5) 
the c.f. is reduced to U = YC, ty” ek{?P i), (3-6) 


4. The next development will be confined to the case of ‘varying component media’. 
In other words, it is assumed that «,; and «,; vary from one component to the other, while 
the second-order moments of the components are equal, i.e. 0); = 1, %2; = %2,P; = P- 

For the characteristics of this universe we introduce the notation 


a = Up; Xj; 08; (4:1) 
By taking the centre of the system as origin, we first obtain (cf. (3-2)) 
Hig = Pr% + Pe%2+ --. + Pptin = Ajo = 0,| (4-2) 
Hon = P1%ar + Poteet +--+ Pn %on = Ag = 0.) | 
The primes are then consequently dropped on the A’s of second and higher order. The 
characteristics of the univeise are most easily expressed by the cumulants 
Avo = Fi +Ag, Ay = PO,F2+Ay, Age = FF + Age. 
Aso = Aso» Asi = Aer Arz= Ara, Ags = Ags- 
Ago = Ago—3A5q, Agi = Agi—3Ag9Ay, Age = Ave— Avo Age— 24h, 
Ais = Ayg—3Aq2Au, Ag = Agg— 3AGp. 


(4-3) 


2) 


3) 


Ts 


3) 
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The c.f. of the first- and second-order moments around the origin follow from (2-3)-(2-5) 


and (3-6) as 


We now adopt the following notation: 











Lea, DMs; 
i 
ap-ta-. wet 
Lai, = kPa, 
t 
aX = ra ary) -iz- ‘ 


From these we derive 
(v) __ , 2 — , "2 
3) = Ag9— 458, Ag = Aog— Api, 


_ oo o% 
oN 


: Ao, @ 
We further introduce bw») = + 


—rkY P; 


U=QyGe% 


(4-4) 


D kPa; 9; 
oe So 
a} N 


baa 
| 


al) = 41 — 249M}. (4-6) 


au poi |. 


(4-7) 


For sake of simplicity, v has been omitted in some of the formulae. 

In applying the Inversion Theorem to (4-4), the integration is first performed on the 
characteristic variables of the sample means, t,) and f),. As in the normal case, this pro- 
cedure brings out the central moments of the second order rather than the raw moments. 
After this first step, the joint fr.f. can be written as 


o2 
7 
F[Z,, Xg, Mag, Mog, M4] = xG, x P| 21, 7%; Be, 41.55 NV 


o2 
ed P| 


+a 
l 3 
we, ee e—™20t2o—Mo2 fog— 11 bh 
2n 
—-@ 





2 
: x exp [% to + Qoatoe +11 ti, — Abt a9 to2 + ah Q-HN-D don dtgg dts. 


(4-8) 


The variables typ, tp. and ¢,, consequently refer here to the central moments. 
It is directly clear from (4-8) that for a fixed combination of components, v, the sample 
means and the second-order central moments of the sample vary independently of each 


other. Dependence is introduced by weighing them together in the sum for all v. 


Developing the exponential of (4-8) in a series, the fr.f. of the quadratic moments may be 


expressed by 


1 
F[mg9, Mg, 41] = EC, vo (= 1)y 


0 
+ “om, 


02 02 
24 o(4 OMe OMge ~ra) M1 


+(-IP SE P¥at(- PSL Pyst--| 


k 
ee (4-9) 
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Here, the brackets [ ] stand for the expression of partial derivatives, while y,, is defined 
by (cf. (2-9)) 
- = aka 2 1HN—4)+k 
Y LIMo9, M2, 44] = 4nI(N — 2+ 2k) | o30%(1 —p?®) [gq Moe — 5; ] 


N Meg M2 gg My 
a | ae So (4-10 
x exp | ap) ott oF Pao, |) 1") 





By performing the differentiations indicated within the brackets in (4-9), the total fr.f. 
of the quadratic moments appears as a sum (over v) of infinite series. These show similarities 
with what might be termed three-variable Romanovsky’s expansions of the normal-theory 
distribution. The simple analogy is, however, weakened by the appearance of second-order 
derivatives within the brackets of (4-9). It may be noted that these do not appear if the 
exponential of (4-8) is presented in an alternative form (cf. Hyrenius, 1950). For the present 
purpose, the form given here is to be preferred. 


Retaining terms of the first order in a only, we obtain as a first approximation an 
expression of the type 


F, = DO{1 + Kg + Ky 199 + Km. + K3my3} Wo, 
where the K’s are linear functions of the a’s. 
Now, the sums > C,a}o, etc., can be expressed by means of the characteristics of the 


universe. In particular we have 





= C, ax —; Ag; ZC, aca = ‘ Ag, = C,al) = ‘> y (4:11) 
In order to simplify the formulae further we introduce the new parameters 
(1 —p*) Bap = e+ eSe- mpl, 
(1 p8) Buy = PUR +28 — 2p SU, r (4-12) 
(1-8) By = p+ poe (1 +eSe. 


We then finally obtain the following expression for the approximate fr.f. of the three 
second-order moments 


N-1 
F,[Mo9; Mog, 41] = Pol Mag, Mog, M11] (1. 2(1 — p®) 


eae Meo Moo M1 | 
+ raph | Bo SB + Bea Se 2B, II. (4-13) 


A closer approximation to the exact fr.f. of the three quadratic moments is obtained if 
the expression within squared brackets [] of (4-9) is also retained. After performing the 
differentiations and inserting the values of ¥ C,a%,, etc., theapproximate fr.f. F,[7m9,™g9,™)] 


[Boo + Boz — 2pB,,] 





follows as the normal-theory fr.f., multiplied by a quadratic form in mp, mg. and m,, 
containing the parameters A up to order Ajo, etc. Whereas approximation F, contains 
three parameters in addition to the three appearing in the normal theory, approximation 
F, contains nine more. 


re 


n 
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The marginal distributions as regards mz) and mo, are, of course, identical with what 
has already been demonstrated for the univariate case (Hyrenius, 1949, 1950), namely, 
a Romanovsky series of type IIT. 

Writing e-Nmao/2oF 9, HN-1+k—1 








Suv—v+n(Me0) ak THN —1)+k] [202/N]KN-D+k? (4-14) 
we have, with the notations used in the present paper, the complete formula 
— 
FLmm9] = ZC, EZ (— WEP SR —w+4 mao], (4-15) 
and the first approximation thereof 
N-1Agy , N Aggm 
F,[m29] = fy» [Meo] (1 oor A 7 esl 
N—1Agy 1A 
oe — oer 22 | fuy-vlmel+ > alae a of Sawaal]. (4-16) 
The first characteristics are, for the exact formula (4-15), 
; N-1 ) 
y[Mgo] = N° [o{+ Aso], 
(4:17) 


N-1 (NV — 
Halo] = 2 [a7 + Ago]? Aa [Ago — 3A go]. 


¥or approximation (4-16) we have 


N-1 N- N?2-1 
[73 +Ago], HelMeq] = 2 WN? t [8 + Age]? — W2 Abo. (4°18) 











Fy[M9] = 


The distribution of m,, will not be studied here, but some observations may be made 
concerning its first moments. 
In the normal case we have 





: N-1 N- 
[mM] = TW P71 % B[m,] = WN? t oto%(1 +p*). (4-19) 
The moments in the non-normal case are most easily obtained by differentiating the c.f. 


U[ty] = DC,exp [aa] QytA-», (4-20) 





Q, = 12%, _ mop") 


where vy um WN? 


f,. (4:21) 
Confining ourselves to terms of lowest order in A, we obtain 


as 
Alm] = 2 ge, 02+ Aj], 





ie N-1 N-1 Ag A A 
B{m,,) = = WN? ojox(1 +p*) + rato e+ ese |. (4-22) 
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5. The fr.f. of the correlation coefficient, r = m,,/,/(2297%2) is obtained from the joint 
fr.f. of mao, mo. and m,, by the transformations 


Me = wel, Meg = we, m,, = ur. \ (5-1) 
If, further, we write Iy = [" =e (5-2) 
: ? . [cosh ¢t—pr]*’ ‘ 
the normal-theory fr.f. of r is obtained as 
gtr] = = (1 —pyn-va yor of, ,, (53 


Up to N-* the mean and variance of r are given by 


1 1 
4y[r] = he $p(1 — "7 $p(1 —p) (1+ 3p%), 
(5-4) 
Afr] = yp +z $(1 —p*) (2+ L1p?). 


When sampling from a non-normal universe, the fr.f. of r may be defined as an infinite 
series containing functions J,,.,,,(7). Because of well-known recurrence relations (see, for 


example, Gayen, 1951) these can be transformed into a series of successive derivatives of 
g(r) with regard to p. 


Assuming the universe is given by means of a compound normal distribution with 
varying media, the first approximation of the joint fr.f. of the quadratic sample moments 
(4:13) gives rise to the following fr.f. of the correlation coefficient r: 

G,[r] = Bay + pBoz— 2B oe) 5-5 
il] = g(r) + 4(PBoo + pBog — 2By,)—-— (5-5) 
From this, the mean and the variance can be derived as 
Cy (r) 
Aylr] = 44(r) + 3(p Boo + pBo, — 2Bj;) - 
“Hs at 





(5-6) 








Aalr) = 10(7) + HpBo t+ pBo,—2By ft (r)?. 


Because of what has been shown by Gayen (1951), ate seems to be no need to go further 
into the non-normality effects on the fr.f. of 7, especially as Fisher’s method of z-trans- 
formation has been shown to allow even a fairly pronounced non-normality. 


6. If in the normal-theory fr.f. of m9, "92 and m,, (2-9) we introduce the regression 

coefficient b,, = m4,/my and integrate over mp and mo, the fr.f. of b,, appears in the form 

Pin] [o3/o — fi, nil 7 (6 1) 
PS) P[3(N — 1)) (ba — Bale (o3/o4— f3,)}’ 


where ,, = po,/c, is the regression coefficient of the universe. The regression coefficient 
b., is, in other words, distributed in ‘Student’s’ distribution. 
The characteristics of the normal-theory fr.f. of b are 


Th] = bl) = 1 o% 
Kil = 2, mlb) = y~3[ 2-H], a 
vil] = 0, y,{0] = 6(N 5). 


h{ba,) = 
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When sampling from a general non-normal universe, the fr.f. of b may be expressed by 
the normal-theory fr.f. multiplied by a power series in 6 or, in analogy with the Gram- 
Charlier’s A-series, as an infinite series of the successive derivatives of the ‘Student’s’ 
distribution. 

Introducing the notation 

Info) = a TUNE (offot— pro 
. 1(3) P[3(N — 1) +k] (6-2) + (o3/of— A)” 
it may further be recalled that the derivatives of even order of h,[b] can be expressed by 
means of h,[b]. The general expression of the non-normal fr.f. of b is thus reducible to a series 
of ‘Student’ distributions together with another set or a single distribution, multiplied by b. 

If the universe is defined by means of a compound normal distribution with varying 
media of the components, we have, when restricting ourselves to approximation (4-13) the 
following expression for the distribution of b 





(6-3) 


N- 
H,{6] = [1- (1 — Sah (Bo + Boa 2pB,,)+(N — 1) Bu | ho() 
| N- N-1 
oi- oH (Boo Boz) hy(b) — 2x1 —p4) (By — pBog) =} bh, (6) 
= Cyho(b) + ¢,hy(b) + ¢, bh, (6). (6-4) 


The exact values of the raw moments of the regression coefficient can be obtained from 
, the c.f. (see for example, Quensel, 1938). We thus have for the ith moment 








2» qd: 1 +00 iT] 
pi[b] = | — — e—M29 tao (2°U Lao» aa] too, (6-5) 
0 Mao 271 J —w \ oti, Jur=0 
4 hf2 
iieae Uin,t.J= Sees [ Sete ty ah Qz tN-», (6-6) 
v 12 
2c? 290710. o%02(1—p? 
} and Q, = 1- Wy t20- at #1, - ATH Pg. (6-7) 


Defining f),(79) by (4:14), we have in particular for the mean 





= , [2 dmg ([TN-1 -1] 
[6] = SC, [ aa —_" poyo,+an| 3 © 2 = 230 FL] 


J0 Mag 





2901 F249 S (—1)Fady 
ae’ N 2, k! 2 [199] . (6-8) 


Retaining only terms of first order in a we have, by using the relations (4-11), the following 


approximate expression: 
7“{b] = aes | eet | ° 
TAU) =A 1 2 + 7 (6-9) 


For the second-order moment of 6 we obtain in a similar way the approximation 


z | +28 eS - Ax], Aw 3 2 ip Au . 


In summing up the findings of this section, it may be noted that the non-normality, as 


? expressed by a compound normal distribution, does not cause any clear effect on the 
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sampling distribution of the regression coefficient. There will, for example, be no possibility 
of judging a priori whether the corrective terms, added to the normal-theory value of the 
variance of b, have a positive or a negative dominance at a given stage of approximation. 


REFERENCES 


Baker, G. A. (1931). Ann. Math. Statist. 2, 333. 
GayEn, A. K. (1950). Biometrixa, 37, 236, 399. 
GayEN, A. K. (1951). Biometrika, 38, 219. 

Geary, R. C. (1947). Biometrika, 34, 209. 

Hyrenivs, H. (1949). Skand. AktuarTidskr. 32, 180. 
Hyrenivs, H. (1950). Biometrika, 37, 429. 

QUENSEL, C.-E. (1938). Acta Univ. Lund., N.F., 34, 4. 1. 
QUENSEL, C.-E. (1944). Skand. AktuarTidskr. 27, 210. 
QUENSEL, C.-E. (1947). Skand. AktuarTidskr. 30, 44. 
Rossrns, H. (1948). Ann. Math. Statist. 19, 406. 
WEIBULL, M. (1950). Skand. AktuarTidskr. 33, 137. 
WEIBULL, M. (1951). Skand. AktuarTidskr. 34, 53. 





ility 
the 
ion. 


[ 247 ] 


THE ESTIMATION OF THE POISSCN PARAMETER 
FROM A TRUNCATED DISTRIBUTION 


By P. G. MOORE 
University College, London 


In a recent physical problem the question arose of dealing with a counter which appeared 
to stick at certain numbers when counting radioactive particles. Hence only the earlier 
part of the usual distribution of number of particles emitted in a certain interval of time was 
available. To estimate the Poisson parameter in this case use may be made of the maximum- 
likelihood solution to the problem by Tippett (1932). However, this method is rather 
unwieldy, especially if we have more than four of the individual frequencies available, 
since we are then unable to use the nomograms given by Tippett in his paper. Bliss (1948) 
has developed an approximation to maximum-likelihood procedure and has provided two 
tables necessary for the method. The tables are only available for cases where we have four 
or less of the individual frequencies. The function proposed in this paper is readily calculated 
for any number of cell frequencies and gives a direct estimate of the Poisson parameter 
without the need for subsidiary tables. 


Table 1 
No. of emissions per interval S- 2. ee. see LP es Total 
Frequency of intervals Me Hy My Ry ce MH N 


The notation that we will use is given in Table | and if A be the Poisson parameter for the 
population we have 














atta: Ae Aze-A Are-A 
C+ Trt: ar eee a 
2-4 Ar-1e-4 
-_ -A —A = 
at +Ae“*+ art | 
r  jte-A [r—=1 Ate-A q 
- pte a) 
i=0 a! i=o +! 
This suggests that in order to estimate the Poisson parameter from our truncated series we 
might take rf t 
z= Din,/| o,. (2) 
i=0 i=0 


The statistic 2 is extremely simple to calculate, and we will first obtain expressions for 
its mean and variance and then consider its applications. 
We rewrite x as 


c= Saat a-e(i/e(e)) CS) 
oS) DE we : 


neglecting higher-order terms. Hence 


E(x) =A-4E ¥ in {m4 ( z") WVle(3 =” 


i=0 i=0 
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Let p; correspond to the group frequency 7,, so that 
p,; = Abela}; 
then we know that E(n,n;) = N(N—1)p;p; (tJ), 
E (ni) = Np, + N(N —1) pi. 
Using these relationships we get 


r=s 
6a) = r-['Ziv-(Ep,)( Biv) |/ (Zr) 3) 


Thus the expected value of z is not A ually but is A minus a corrective term. Owing to the 
presence of N in the denominator this term is usually very small as will be shown later, and 
hence it seems that the method is almost unbiased. Of course, if the original expansion in 
x had been taken to a further term, then the next correction would have had N? in the 
denominator and hence would be even smaller. We will next find an approximate expression 
for the variance of x. Now variance of x = &(z*)—{6(2)}?. 


Neglecting terms after the first, we get 


coe) eS 


r r-1 r r—1 2 
=["EPp+M N-1) ¥ ip?+2N(V-1) 5 iiv.v, | (3) 
i=0j=i+l1 i=0 


i=0 


after some algebraic reduction. 


6 (x) is apyrosimetehy. x ip; I's p; and hence we get 
o2= Sa") {82} 
ae ~{* 1 \2 
- [ Sia, ‘ Bp; 22 5 > ips |] (>>) 


i=0j=i+1 


= [See (Sin) I](Se) s 


If we substitute sample values for population values and put p; = n,/N, we get as an estimate 


of a2 ® 1/7. \71//r-1 \2 
s3 = [ Bim N ( & in,) | (=x) ; (5) 
i= i= i= 
Table 2 
More 
t 0 1 2 3 4 5 6 7 8 than 8 Total 


Me 57 203 383 525 532 408 273 139 45 43 2608 


We will now apply our results to two series of data. The first is the same as that used by 
Tippett, namely, the data due to Rutherford & Geiger (1910). In Table 2, t is the number of 
a-particles observed in an eighth of a minute and n, is the number of intervals in which 
t particles were observed. Using successively s = r+1= 2,3,4,...,8,9 cells we may obtain 
our estimates of A from equation (2). These estimates are given in Table 3. The figure for A 
estimated from the mean of the complete distribution is 3-871 and this value has been used 
to calculate the corrective terms from (3) and the standard errors from (4) which are given 
in the next two columns. The last two columns of Table 3 give the estimates of A, together 








ey oO Fr & O 
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with their standard errors, found by following the complete maximum-likelihood procedure 
given by Tippett. The value marked with an asterisk was found using Tippett’s nomograms. 
It can be seen that our values are all within twice the standard error of the estimate from the 
value for the complete distribution. The corrective term is very small and hardly seems to 
be worth applying. 





Table 3 
Maximum likelihood 
cote —* ~ 
8 Estimate of A Corrective term  s.E.ofestimate Estimate ofA _ s.£. of estimate 
2 35614 —0-0015 0-26 3°8885 0-07 
3 3-7269 +0-0015 0-14 3-9313 0-05 
4 3-9565 + 0-0008 0-09 3-91* 0-05 
5 4-0000 + 0-0003 0-07 3-9032 0-05 
6 3:9482 +0-0001 0-05 3-8914 0-04 
7 3-9611 — 0-04 3-8756 0-04 
8 3-9156 — 0-04 3-8661 0-04 
9 3°8552 — 0-04 3°8826 0-04 


Another example of counts of radioactive particles is given in Table 4, and it will be seen 
that there appears to be something the matter with the counter round the region for ¢ equal 
to 13, 14 or 15. This feature was noticed with several more counts using the same counter. 
If we use the cells up to ¢ = 12 and estimate A using equation (2) we find A = 14-4; 
fitting a Poisson series with 14-4 as mean and the same total, 526, as for the original 
distribution we obtain the values 7, for the cells up to ¢ = 12. The agreement is good and 
a x? test provides further evidence of the fit. Using the maximum-likelihood procedure the 
estimate of A is 14-47. 


Table 4 
More 
than 
$$ @ ji 2 3 4 5 7 8 9 10 1] 12 13 14 15 15 Total 
n—1— 1 1 — 7 7 12 20 28 43 50 45 45 70 196 526 
| - 
n, — — 0-03 0-15 0-53 1-51 3-63 7-47 13-44 21-51 30-97 40-54 48-66 357-56 526 


A comparison has been made of the standard errors of Tippett’s complete maximum- 
likelihood procedure and the proposed procedure. The results are shown in Fig. 1, where 
the values of ./No,/A are plotted against A. The unbroken lines refer to the proposed 
procedure, whereas the dotted lines refer to Tippett’s maximum-likelihood procedure. For 
values of A less than one the two methods have much the same standard error. For values 
of A from one to three it is, broadly speaking, necessary to take one more cell to get the same 
accuracy with the new procedure as with the maximum-likelihood procedure. When A is 
from three to five it is necessary to take two more cells for the same accuracy. It must be 
remembered, however, that if N is at all large the numerical difference between the standard 
errors of the two methods may be very small indeed. 

In Table 5 the values of the corrective term, based on a value for N of 100, are given for 
various values of A and s. Values of the corrective term for any N may be obtained by 
multiplying the figures given by 100N-!. It will be seen that the corrective term is very 
small relative to A and provided N is not extremely small may be neglected. 

The usefulness of this method depends largely on whether the standard error of the 
estimate is small enough for the purpose in hand: In any particular problem it would be 
possible to find an estimate of the mean quickly by the new method and hence obtain from 
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Scale of WNo./A 


Table 5. Values of the corrective term for N = 100 





25 


2-0 


Values of A 
l 2 3 4 5 
— 0-0100 — 0-0200 — 0-0300 — 0-0400 — 0-0500 
— 0-0032 — 0-0036 +0-0077 +0-0474 + 0-1561 
— 0-0013 — 0-0023 + 0-0034 + 0-0246 +0-0807 
— 0-0004 — 0-0016 + 0-0003 +0-0107 +0-0387 
—0-0001 —0-0009 — 0-0008 + 0-0038 +0-0183 





———— Proposed method 

— — — Maximum-likelihood method 
S=number of cells used 

N= total number of observations 
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Fig. 1 an estimate of the standard error of both the maximum likelihood and the proposed 
methods. From these a decision could be made as to the necessity or otherwise for carrying 
out the complete maximum-likelihood procedure. The method has obvious applications in 
many fields besides the counting of radioactive particles, for instance botanical counts of 
plants in quadrats, and although the estimate derived has a larger standard error than that 
obtained by carrying out the complete maximum-likelihood procedure the great simplicity 
of computation is in its favour. 
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THE FITTING OF GROUPED TRUNCATED AND GROUPED 
CENSORED NORMAL DISTRIBUTIONS 


By P. M. GRUNDY 
Rothamsted Experimental Station 


INTRODUCTION 


The fitting of a truncated normal distribution to a random sample of continuously variable 
observations has been fully discussed by various authors, but fresh problems arise when 
the data are grouped. The effect of grouping may also be considered when the distribution 
is ‘censored’ instead of truncated in the ordinary sense, i.e. when observations in the region 
of truncation are counted, although their values are not recorded. In this paper both the 
truncated and censored cases are investigated, the point of truncation being assumed 
known. A method of finding the maximum-likelihood estimates using ‘adjusted moments’ 
is given, and the effect of grouping on their large-sample covariance matrix is discussed. 
Grouped censored normal distributions fall within the scope of a previous paper by 
N. F. Gjeddebaek (1949), where an iterative method of fitting is given, but the iterative 
method of the present paper is much simpler when the group intervals are not too large, 
especially when they are nearly equal. Even in that case, it is known that Sheppard’s 
adjustments for the uncensored part of the distribution may be misleading (Pairman 
& Pearson, 1919). 

The results of this paper are of use in the fitting of a ‘discrete log-normal’ distribution, 
such as has been studied by the author (Grundy, 1951). It must, however, be made clear 
that the process considered there, of Poisson sampling from a continuous distribution, does 
not lead to a grouped log-normal distribution. For this reason the process of fitting (Grundy, 
in preparation) involves some further adjustments which find no place in the present paper. 

The transformation* from an enumeration variate m to log(m+1) has been used for 
comparative purposes for many years, the transformed variate being treated as normal in 
an asymptotic sense. In particular, we may suppose that log (m+ 1) has a censored grouped 
normal distribution with group boundaries corresponding to m = 0, 1, 2, ...; and it has been 
claimed by Spiller (1948) and Thompson (1951) that this hypothesis works well in entomology 
even when the probability of m = 0 is appreciable. Such a distribution can be fitted with 
the help of Thompson’s tables: but if the data are subjected to further grouping more 
flexible methods become necessary, as shown in the Example below. 

Notation. We consider observations of a variable x for which the range —0<2z< yj is 
truncated or censored, while values in the range x > 2) are grouped into the intervals with 
end-points 29, 71,2, .... The length of the ith interval is denoted by h; = x;—2;_, (assumed 
positive), and the midpoint by q; = }(z;+2;,_,). The frequency of values in the interval 
(z;_,,%;), from the data of a random sample, is denoted by f;, and here z_, is interpreted 
as — 00. The total frequency (including f, in the censored, but not in the truncated case) is 
denoted by ». Bold square brackets are used to denote averages weighted with the 
frequencies f, (i > 0): for example 


a=) = (Zr) /(E2). 


* For some purposes the transformed variate log (c +m) has been found appropriate (Kleczkowski, 


1949). 
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For a truncated distribution with original probability-density g(x), we define the adjusted 
kth moment of the sample about the origin to be 


[ | % xig(n) dx / {" 9(@) az | . 


In practice, the parameters in g(x) being unknown, working values of the adjusted moments 
will be calculated by substituting estimates for the parameters. Revised estimates may 
then be calculated by equating the truncated moments of g(x) to the adjusted sample 
moments, and the whole process repeated if necessary. 

When the ungrouped distribution is truncated normal, it is often possible to evaluate the 
adjusted moments (k = 1, 2) by simple approximate formulae giving their differences from 
the ‘raw’ moments [q¢*]. Moreover, the above process can be carried out simply by sub- 
stituting the adjusted sample moments in any of the existing methods of fitting the 
ungrouped distribution by moments. Such methods have been given by Pearson & Lee 
(1908, 1914), Fisher (1931) and Cohen (1949), but the most convenient tables are those of 
Hald (1949), which eliminate the need for inverse interpolation. It will be shown (dis- 
regarding the question of convergence) that the above process leads to the maximum- 
likelihood estimates based on the grouped data, and we use this fact to get the large-sample 
covariance matrix of the estimates. 

For the fitting of a censored grouped normal distribution the adjusted moments of the part 
of the sample in the range x> <p», as defined above, are again used. These moments are 
substituted in the estimation process given by Hald (1949) for the censored but otherwise 
ungrouped case, and again the method leads ultimately to the maximum-likelihood 
estimates. 


In the rest of this paper, g(x) is taken to be the normal probability density with mean 
a and variance o?. We write 


O(u) = f ey? exp (— 4#?) dt, 


and denote differentiation by dashes, so that 


We also use the abbreviations ®, = O((z;-—a)/o), ®; = O'((x;-—a)/o), etc., and 


u; = (4;-—%)/o. 


THE ADJUSTED MOMENTS 


By a straightforward integration, it is found that 


f aol) /[" * a0 (= (*=*) =a- (5 $2). 


The first adjusted moment of the truncated sample, say M,, is therefore 


0-0, 
o,-6,,]° (1) 


1=a-C0 
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Similarly, the second adjusted moment of the truncated sample, say M,, is found to be 
0; 0; 0; - 4 
= 8+ ¢%— eB ak bat Rees — 2 
M, = «+07 — 2ao| 5 2+ 6,- o,. (2) 
The expectations of M, and M, are of course the moments of the truncated population. 
In the truncated case, the log-likelihood L, for a single observation in the ith interval is 
L, = In(®;—®;_,)—In(1— ®p). (3) 
The maximum-likelihood equations are therefore 
On =| ym O— Oi4 ® |: 
0a ¢ L®,-9%,, 1-9, 
ot] _ 1 Mi, O8 
oo md ®, — ®,_, 1 — Dy 
and by (1) and (2) these involve equating M, and M, to the truncated population moments. 
This leads to the method of fitting explained in the Introduction, since the tables cited 


provide for fitting the ungrouped distribution by the method of moments. 
In the censored case, the maximum-likelihood equations become 


® i-1 
fog + “Lame. ®,, 0, 


—O%,]_ 
fog" +(n—fo) aa = 0. 





0= 





Hence, using (1) and (2), we obtain 


lots (n OA —a)=0 
5 _ - 
fae_® talline a Y 





The corresponding equations for the ungrouped distribution are given by setting expressions 
(5-3) and (5-4) of Hald (1949) equal to zero; and the only difference is that here the adjusted 
sample moments (about x = a) are used. 

We shall now obtain approximate formulae which usually give the adjusted moments 
with sufficient accuracy. For this purpose, the following expansions involving Hermite 
polynomials in u are needed: 














0,-9, one ’(u;) 1+ op Me 2453 (Mi- 1)+ 5 (ut — 6u? + 3) + 
* "2 ed 
, , »I 3_ ht 
oO; — 0; oe (u?—1)+ a (ut — 6u? +3) +—"!_(u8— 15u$ + 45u9— 15) 
ea “— 2402 , 192004 ' . +... 








(The expansions follow from Taylor’s theorem applied to the various functions of u, + $h,/o.) 
Hence it is found that 


4 
ba, it as: ae. A 
o,-6,, est Tea" + 2)+...| (5) 
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and 








0; Gis O;_; uj (u? 


®; ig ®;, = 








—1)+ (ut + 3u?—1)+.... (6) 


i 
36004 


Neglecting the h4/o* term in (5), and substituting in (1), we find that the first adjusted 
moment is approximately 


i 


a+ o([u] —[yeuh?/o*)), 





i.e. | []- al Irie ; (7) 








Similarly, the second adjusted moment is approximately 





2 a 
[7]+ [h?] o? + ol 2[h7q?] { (8) 














In the particular case when the group intervals are equal, the only sample quantities 
involved in (7) and (8) are the raw moments [gq] and [q?]. 

It may be added that a suggestion of Bliss (1937), that allowance for grouping of the 
censored distribution should be made simply by subtracting a ‘Sheppard’ correction 
fs(n —fo) [h?]/n from the estimate of o?, is not equivalent to the above adjustments and 
does not appear to have been intended for general application. 


ADEQUACY OF THE APPROXIMATIONS 


In practice a and o have to be replaced by estimates, but apart from this the errors in (7) 
and (8) are attributable, to a first approximation when the group intervals are small, to the 
h*/o4 terms in (5) and (6). Thus the bias in (7) is the expectation of 





hiu,(uj + 2) is 
-[S5— 72003 + O(h*), 
ee 1 Shiu,(uj +2) O'(u,) 6 
which is -7-6,2 72004 + O(h®), 
Since Oo; — O7_, = ~* * O’(u;) (u — 3u,) + O(h3) 


(similarly to (4)), the bias may also be written as 


a 4 mg” ,_ oe 6\. 
72005(1 — ®,) DHL; — Oj + 5(®; — Oj_.)} + OR"); 
and when the group intervals are equal the principal part of this expression reduces to 
—h*( Op + 594) (s) 
72005(1— ,) * 





Similarly the bias in (8; is found to be 


-1 


360051 — = ht {o( O' — Of, + (07 — O}_,) + 5(®, — ,_,)) 


— (Dj — DF_, + 5(D; — O}_,))} + O(h*). 
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When the group intervals are equal the principal part of this expression reduces, after 


simplification, to —h4(5(1 — ©.) — 2. Of +2 0-4 OF + 5@%)), 


3600%(1—®,) a 
and here the component involving x), which may be compared with (9), disappears when 
moments are taken about the point of truncation. 

To illustrate the effect of moderately coarse grouping, some numerical values of (9) and 
(10) are given in Table 1. In each case the value of h is taken just large enough for one third 
of the truncated ungrouped population to be included in an interval of length h. 


Table 1. Examples of the approximate bias in (7) and (8) 


(Values calculated from (9) and (10), relating to adjusted moments about the point of truncation, 
with uniform grouping, o = 1, 2) = 0.) 
Negative bias in 








adjusted moments Truncated population 
- A — values 

Ist 2nd — Xx ~\ 

a h 10-3 x 10-3 x Mean Variance 
+2-0 0-8406 0:30 6-6 2-0552 0°8865 
+ 1-5 0-8006 0-49 5-2 1-6388 0°:7726 
+1-0 0-7181 0-53 3-3 1-2876 0-6297 
+0°5 0:5860 0°35 15 1-0092 0-4862 
0 0-4307 0-15 0-48 0-7979 03634 
— 0-5 0°3215 0-072 0-18 0-6411 0-2685 
—1-0 0:2493 0-041 0-086 0-5251 0-1991 
— 1-5 0-2003 0-027 0-048 0:4387 0°1495 
— 2-0 0-1657 0-020 0-030 0-3732 0-1143 


Even in the most unfavourable case, the approximate bias in the first adjusted moment 
is less than 0-1 % of the population mean and of the population standard deviation; in the 
second adjusted moment it is less than 1 % of the population variance. 


THE INFORMATION AND COVARIANCE MATRICES 


The information matrix of the truncated normal distribution has been given by Fisher 
(1931), and that of the censored normal distribution by Stevens (Bliss, 1937) and more 
explicitly by Hald (1949). In both cases the inverse matrix (the large-sample covariance 
matrix of the estimates) has been tabulated in dimensionless form by Hald (1949). It 
remains to consider the effect of grouping, and convenient approximate formulae for the 
matrix of loss of information will now be obtained by a direct method. (For exact formulae 
in the grouped censored case, see Gjeddebaek, 1949.) 

The ax element of the information matrix of the grouped truncated normal distribution 


is the expectation of —n0*L,/éa?, where L; is given by (3) and carries the probability of the 
ith group. Using (4) we obtain 


L,; = In(h,®'(u,)/o(1- ,))+In/1 + Ba. (u?—1)+...), 


2402)! 
; ‘ eal ae 
that is L, = $1n (h3/270?) —In (1 — Oo) — dui + 5a (ui 1)+.... 
OL, a n { h? | 
Hence Na = agin (1 — ©.) + al! — igget |: 








ww 
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Here the terms free from h represent the information in the ungrouped distribution. The 
leading term in the loss due to grouping is therefore the expectation of ;4,nh3/o*; and on 
large-sample theory this may be replaced by ;5n[h?]/o*. 

Again OL, o2 h2 


ie = N~ Do (L— Po) + =, {2u.- 302 +4, 








: te n { O;—- Oj) Aju; 
and by (5) this = nao mB (1—@,)+ a (5-5) Got tof" 


Here the above argument applies, because the expectation of the terms free from h is 
independent of the grouping; consequently the loss due to grouping is approximately 
in[h?u]/o*. Similarly, the loss in the oo component is approximately 4n[h*u*]/o*. Thus 
the loss in the information matrix, due to grouping the truncated normal distribution, is 


approximately nile ¢ 
wale 


where € = [h?]/1207, ¢ = 4([h?q]—[h?] a)/o°,) 

9 = ¥([h?q?]—2[h2g]a+[h2]a%)/ot, | 

Similar calculations show that the loss due to grouping a normal distribution already 
censored is given by the same formulae, except that n must be replaced by n—f. 

In the practical fitting of a truncated or censored normal distribution, Hald’s Tables 


II and IV give the quantities ~,; such that the large-sample covariance matrix of the 
estimates of « and o from an ungrouped sample is 


o ~ it 
m\ty2 29) 
The effect of grouping is that the y,; are replaced by ,;+du,;; where, in the truncated case, 
ae +Ofyy  fygt i sal os i ( 3 
Hyg +Ofy2 Hoe + OMe Pre Mae, f 9) 
Hence, treating the squares of ¢, € and 7 as negligible, we obtain 


Spay = CMG, + 20h Mae + Mie, 
Spy = Ey ye + (Mir Mee + H¥2) + Mre/22, 
Olan = Eig + 2 yo /lo2 + MM 3e- 
In the censored case, these formulae for the du;; require to be multiplied by (n —f,)/n. 


(11) 


EXAMPLE 
The frequencies in Table 2 are obtained by grouping the data of Spiller (1948) giving 
the numbers of red scales on 720 citrus leaves. A grouped censored normal distribution is 
fitted after transformation to logarithms (to the base 2) as described in the Introduction. 
In this case the adjustments come near to those of Sheppard, but it will be seen that there 
is a point of interest in the interpretation of these particular results. 
The raw moments are [q] = 2:771, [92] = 9-962. 


By plotting the probits of the cumulative frequencies, the preliminary estimates 2-25 and 
1-78 of « and o respectively were obtained. From (7) and (8) the adjusted first and second 


moments are 2-771 — (2-771 — 2-25)/38-02 = 2-757 
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and 9-962 + (3-168 + 5-50 x 2-771 — 2 x 9-962)/38-02 = 9-922. 


Accordingly the independent variable ‘y’ of Hald’s Table III is 9-922/{2 x (2-757)?} = 0-6527; 
his ‘h’ (our f,/n) is 89/720 = 0-124; and interpolation in the table gives the value — 1-217 for 
his ‘z’ (our (x»—«)/o). Using Hald’s Table IV we obtain ‘g(h,z)’ = 0-686, which leads to 


the estimates of o, 2-757 x 0-686 = 1-891: 
ofa, 1-891 x 1-217 = 2-301. 


Table 2. Observed and expected distributions of red scales on 720 citrus leaves 





Expected frequency 
Scales Frequency Logarithmic c A ‘ 
per leaf of leaves range 2q Censored Truncated 
0 89 (— «)-0 —— 78-7 
1 79 0-1 1 96-2 79-6 
2-3 136 1-2 3 138-2 131-7 
4-7 154 2-3 5 150-3 156-0 
8-15 124 3-4 7 124-3 132-8 
16-31 88 4-5 9 17-7 81-1 
32-63 37 5-6 1l 36-9 35-7 
64-127 11 6-7 13 a : 
128-255 2 7-8 15 jaz jas ’ 
Totals 720 — _— 720-0 631-0 
A second cycle of calculations gives the revised estimates z = — 1-230, o = 1-877, 


a = 2-309, and it is clear that further iteration is unnecessary. 

Interpolating in Hald’s Table IV with this last value of z, we obtain y,, = 1-023, 
Pyg = — 9-046, Woo = 0-595. Equations (11) give e = 0-0237, € = 0-0116, 7 = 0-0671, and 
hence we get dy,, = 0-021, du. = 0-004, du.. = 0-020. Thus on large-sample theory the 
estimated covariance matrix of the estimates is 

3-523 1:044 —0-042\ 1 0-511 —0-021 
720 (_ 0-042 cae ~ 100 eo min: J 

It is interesting to consider also the estimates obtained by fitting a truncated distribution, 
i.e. by ignoring the information about the number of leaves without red scales. In this case 
preliminary estimates can conveniently be obtained by using the raw moments in Hald’s 
tables. After two cycles of corrections, the final estimates are a = 2-513, o = 1-712. 

The expected frequencies calculated from the estimates including and excluding the 
information about the zero class are shown in the last two columns of Table 2. The values 
of x* are 7-18 (5p.F.) and 1-48 (4D.F.), neither of which is significant. The difference, a x? 
component of 5-70 with 1 p.F., does, however, exceed the 2 % significance level, and thus 
the observed frequency of zeros appears to be anomalous. 


SUMMARY 


The effect of grouping a truncated or a censored normal distribution is considered in this 
paper. It is shown that a process involving ‘adjusted sample moments’, used in con- 
junction with the published tables relating to the ungrouped distributions, is equivalent 
to maximum likelihood estimation. Approximate formulae for the adjusted moments are 
given, which become particularly simple in the special case when the group-intervals are 
equal, and the accuracy of the approximations is discussed. The effect of grouping on the 
information and covariance matrices is also investigated, and a numerical example given. 








on 
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ESTIMATION OF THE MEAN AND STANDARD DEVIATION OF 
A NORMAL POPULATION FROM A CENSORED SAMPLE 


By A. K. GUPTA 
Department of Applied Mathematics, University of Liverpool 


1. INTRODUCTION 


The problem of estimation from a truncated normal distribution has been treated by 
several authors. Two different kinds of problem arise, depending on whether the population 
from which the sample is drawn is truncated or the sample itself is truncated, the population 
being complete. To distinguish between the two cases Hald (1949) called them truncated 
and censored respectively. The population can be singly truncated at a known truncation 
point or doubly truncated about two known truncation points. Samples can, however, 
be censored in two different ways: (1) observations below or above a given point (also 
called a truncation point) may be censored; (2) the (n — k) smallest or greatest observations 
out of a sample of size n may be censored. These two kinds of censored sample will be denoted 
as Type I and Type II respectively. Natural extension of the two types will lead to two 
kinds of doubly censored sample. Some solutions of the estimation problem from censored 
sample of Type II will be given in this paper. 

K. Pearson & Lee (1908), Fisher (1931) and Cohen (1950) have considered the problem 
of estimation from a truncated population; Stevens (1937), Hald (1949) and Cohen (1950) 
that of a Type I censored sample. Some examples are given below to illustrate how problems 
of Type I and Type II arise in practical situations. 

(a) Suppose it is desired to estimate the average life of electric lamps produced in 
a factory. The straightforward method would be to take a certain number of lamps at 
random and burn them out to get the required data for analysis. Instead of wasting the 
lamps it might be decided to stop the experiment when a fixed number have burnt out. 
The random sample thus obtained would be a censored sample of Type II. Instead of 
stopping the experiment after a fixed number have fused, a decision may be taken to stop 
it, say, after 1000 hr. The random sample in this case will be one of Type I. 

(b) Biologists are often required to perform experiments on rabbits or mice to determine 
the effect of certain drugs on them. A fixed number of animals are exposed to the drug for 
this purpose and their reaction times are observed. Experience shows that some animals 
take an extremely long time to react. Reaction times are not normally distributed, but by 
suitable transformations (e.g. logarithmic) they can be made approximately normal. If, 
instead of waiting until all the animals have reacted, the experiment is stopped, when 
a fixed number have reacted or after a certain fixed time, the data will represent censored 
samples of Type II and Type I respectively. 

It is clear from the above two examples that the number of observations in the censored 
sample of Type I is a random variable. The calculation of the maximum-likelihood estimates 
of the mean and standard deviation in the two types of problem, however, are nearly 
identical, and the same table can be used for both purposes. Hald (1949) has constructed 
a table appropriate for the estimation problem of Type I. The table given in this paper is, 
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however, quite different from his, and we consider it more suitable as it leads to less com- 
putation in calculating the estimates. The efficiency of the estimate in Type II must depend 
on p, the ratio of the number of observations available to the total number, and in Type I on 
the point of truncation. Unless some predetermined knowledge is available about the 
behaviour of the variate, it is difficult to fix the point of truncation before commencing the 
experiment so as to ensure that the efficiencies of the estimate will be above a certain 
minimum value. This difficulty has always come in the way of the experimenter in using 
the method. No such difficulty, however, arises in Type II. From the graph given in the 
text, showing the efficiency of the estimate as a function of p, it can always be decided 
beforehand when to stop in order to get the required degree of precision. 


2. MAXIMUM-LIKELIHOOD ESTIMATE 


Let 2,22, ...,Z,, be a random sample of size n from a normal population with mean yw and 
standard deviation o, and let x, 72, ..., 2, be the censored sample of size k in which 2, is the 
greatest observation. The (n —k) censored observations are known to be greater than 2,. 
The likelihood function in such a sample is 


as a Qm))-k ap he TTS | 
DE (E_1)f(n— it 7 VO) exp do 2, #) 
‘oo 1 n—k 
; -1 se 
x [( J/(27)) [exe 3o2 (x—p) a ‘ (1) 
The logarithm of the likelihood function can be written as 


log Z = C— blog o— 5 2(a—1)?+ (n—k) log ®(y), 
where C is a constant 


Xy,— ph a 
n= 2F and O19) = Tami], oe 


_ (7) 8 ee 
Let A= (7)’ where (7) amy° r. 





The likelihood equations are therefore 


dlog L - A | 
a a alk i Site | (2) 
OlogL k, 1 ‘ a. | 
and ae = tga BiH? + (nk) = 0. (3) 


Substituting the value of A from (2) in (3) gives 


—o? +8? + (%—p)* + (Z—p) (%—H) = 0 


or p* = E+ (o** —8*)/d, (4) 
1 & 1s 

where Z=,7;D%X%y #=7 > (4;,-Z)* 
kz ki 

and d = (x, —2). 


Thus, given an estimate of o, ~ can easily be obtained from equation (4). Estimates are 
distinguished by an asterisk throughout. 
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In order to estimate o, we rewrite the likelihood equations in the form 





o(n+ "== A) =a, (5) 
and o? + ndo — (s? +d?) = 0. (6) 
in 1 k 
Writing z= 7+ C- i A, where p= = 
equation (5) gives o = d/z, (7) 
and substituting the value of o in (6) we find 
_ & I 1+ez—2 
alr as l+yz ~ (8) 


The left-hand side of equation (8) contains only sample quantities and can easily be 
calculated. It is always positive and varies between zero and unity. For a given p, the 
right-hand side is a function of 9. Values of z for different values of y are only needed for the 
estimation problem. Tabie | gives the values of z for different values of p = 0-1 (0-1) 1-0 and 
y = 0-05(0-05) 0-95 to four significant figures. They should all be correct except for 
a rounding off error in the last decimal place. When p = 1-0, z = 7 and 


pu Sak 
aa Te. (9) 


The column with p = 1-0 has been included in the table to facilitate interpolation in the 
interval p = 0-9 to 1-0. 

In order to estimate and o from the censored data, y is calculated from the sample and 
z read from the Table 1 corresponding to the appropriate value of p. Equation (7) will 


Table 1. Values of z for values of yv and p 


Pp 


y C1 of O8 OO O08 O86 OF O8 of 1-0 


0-05 0-5449 0-6656 0-7811 0-9056 1-0498 1-2277 1-4646 1-8157 2-4459 4-35890 
0-10 0-5374 0-6528 0°7616 0-8769 1-0076 1-1645 1-3652 1-6439 2-0854 3-00000 
0-15 05294 06395 0-7417 0-8482 0-9665 1-1049 1-2758 1-5010 1-8267 2-38048 
0-20 0-5210 06256 0-7212 0-8193 0-9262 1-0483 1-1944 1-:3785 1-6271 2-00000 





U-25 05119 0-6110 0-7002 0-7902 0-8865 0-9941 1-1193 1-2711 1-4653 1-73205 
0-30 05022 0-5957 09-6786 0-7608 0-8473 0-9419 1-0492 1-1752 1-3292 1-52753 
0-35 04918 0-5796 0-6562 06-7310 0-8083 0-8913 0-9832 1-0881 1-2116 1-36277 
0-40 0°-4806 0-5626 0-6329 0-7006 0-7694 0-8419 0-9205 1-0079 1-1076 1-22474 
0-45 0-4685 0-5445 0-6087 0-6695 0-7303 0-7933 0-8603 0-9330 1-0138 1-05409 


0-50 04552 0-5251 0-5833 0-6375 0-6909 0-7453 0-8020 0-8624 0-9278 1-00000 
0°55 04405 0:5044 0-5565 0-6044 0-6508 0-6973 0-7450 0-7948 0-8476 0-94868 
0-60 00-4244 0-4819 0-5281 0-5698 0-6097 0-6490 0-6887 0°7295 0-°7718 0-81650 
0-65 0-4063 04574 04976 0-5335 0-5672 0-6000 0-6326 0-6654 0-6991 0-73380 
0-70 0°3858 04302 0-4647 0-4949 00-5228 0-5496 0-5758 0-6018 0-6280 0-65465 


0-75 0-3621 0-3998 0-4285 0-4531 0-4757 0-4969 0-5174 0°5375 0°5574 0°57735 
0-80 0°3342 0°:3650 0°:3878 0-4072 0-4246 0-4408 0-4562 0-4711 0-4857 0-50000 
0-85 0-3001 0°3237 0-3408 0-3551 0-3677 0-3793 90-3901 0-4005 0-4104 0-42008 
0-90 0-2561 0-2722 0-2836 0-2930 0-3011 0-3084 0-3152 0:3215 0-3276 033333 
0-95 0-1921 0-2004 0-2062 0-2107 0-2147 0-2181 0-2212 0-2242 0-2269 0-22942 
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then give an estimate of o and, when o* is known, equation (4) will give an estimate of y. 
A numerical example appears in §6. In small samples, these estimates are likely to be 
(5) biased. 

6) . 3. VARIANCES OF THE MAXIMUM-LIKELIHOOD ESTIMATES 

The variances and covariances of the maximum-likelihood estimates are approximated by 
calculating the expected values of the following quantities: 


_PlogL _ 





























n 
si at = ot P+ H(A —a)}, 
logL 2 & (a;—p\? A 
_ =— —k)— {1 - > 
; sah = 52 (ig) HOM Sell +014 — a}, (10) 
log L k 3 £ (u-p\? Ay 
d oe ae an age so) + (n—k)— {24+ (A—7)}. 
. an = ata ( - ) + (n—k) {2+ 9(4 —7)}. 
he The exact expected values of these quantities cannot easily be evaluated in small samples, 
pe q y p 
he hence large sample approximations have been obtained in the limit when n — 00, p being fixed. 
nd 
; Table 2. The variances and covariances of the maximum-likelihood estimates from 
or 
Censored Sample in terms of o?/n 
Pp on 12 22 Pp On 12 O22 
9) 0-05 51-57600 27-22429 15-68796 0-65 1-19583 0-27471 0-90043 
0-10 17-79459 10-62002 7-51418 0-70 1-13826 0-20657 0-81975 
0-15 9-25870 585259 484903 0-75 1-09492 0-15156 0-74989 
he ' 0-20 5°78039 3°71733 3°53748 0- 1-06232 0-10690 " 
0-25 4-02381 2-54931 2-76048 oo 1-03797 pm pp 
0-30 3-01994 1-83219 2-24800 0-90 1-02008 0-04113 0-58592 
ad 0-35 2-39819 1-35740 1-88531 0-95 1-00752 0-01759 0-54174 
ill 0-40 1-99085 1-02593 1-61549 0-96 1:00559 0:01355  0-5333 
0°45 1-71281 0-78527 1-40712 0-97 1-00384 0-00974 casas 
0-50 1-51709 0-60523 1-24145 0-98 1-00230 0:00617 0-51669 
0-55 1-37607 0-46736 1-10662 0-99 1-00099 0-00287 0-50843 
0-60 1-27266 0-35982 0-99476 
As n> 00, 9 tends to 7, where 7) is the solution of the equation 
7 
G(t) dt = p. (11) 
—@ 
t,— lf? pe 
Furthermore, z=) Al P(t) tdt = —— 4(7) 
o PJ-o Pp 
z,—p\? 17 Die 
and #(—*) | t)t#dt = 1-—- ‘ 
” > ae P(t) ; 79(7) 
o* clog L “ a 
lim —— ‘ ve ue 
Hence he ee Be Yn» 
202 log L 
oC gz A A A A A 
ao =— _ = Vj0; 12 
ins ae Oe 12 | (12) 
; od? log L orn a: 
and lim ———s E~ = 2p AG) + 7°4() (A —7) = ron 
n—>o n 0T 





For values of p = 0-05 (0-05) 0-95 (0-01) 0-99, v,; (t,7 = 1,2) are calculated and Table 2 
i gives o;; (i,j = 1,2), where [7,4] = lo) (13) 
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The large sample approximation to the covariance matrix of the estimates is therefore 
given by 


2 
= (oul: (14) 


Fig. 1 shows the efficiencies of these estimates for different values of p where efficiency is 
defined as the ratio of (i) variance of the maximum-likelihood estimate from the complete 
sample to (ii) variance of the maximum-likelihood estimate from the censored sample. 
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Fig. 1. The efficiencies of the linear and maximum-likelihood estimates, of the constants 
of normal distribution for different values of p. 


4. COMPARISGN WITH HALD’s RESULT 
Hald (1949) has considered the case when the sample is censored so that no observations 
appear below a known truncation point. We have considered the case when (n — k) censored 
observations in a sample of n are greater than uncensored data. In order to make the 
comparison clear we have considered his case when the censored observations lie above the 


truncation point. His equations have therefore been changed accordingly and expressed in 
our notation. 


If we denote the given truncation point by x», the number of observations in the censored 
sample (a variable ) by k, and put 


No = = and d, = (%)—2), 


the results he obtained are equivalent in our notation to 


1 
_ +a a 1+ nol to+ (5-1) A,| 





ested 3 


(A, being the same function of 7 as A is of 7). 
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je (5-1) 44 


he got Y = 9(P, No) (0+ 9(P> No)}- (17) 
He introduced the function 


With IP; No) = 





(16) 


dy = F(p, 0) = 49(P: 0) {20+ 9(P: No)}; (18) 
and tabulated the inverse function 4 =/(p, 4y) (19) 


for different values of p, and 4y in his Table 3. In his Table 4 he has tabulated A, for different 
values of 7. 

In order to obtain the maximum-likelihood estimate of ~ and o by his method, y is 
calculated from the sample and 7, read from his Table 3 corresponding to the cbserved 
value of p. With 7) known the value of A, is read from his Table 4 and g(p, 49) calculated. 

Finally, o is estimated from o* = deg p,%), (20) 


and with o* known M* = Xy—o* Np. (21) 


It is evident that his results differ from ours only in replacing x9, the truncation point, 
by 2,, the greatest observation in the sample throughout. For large n the problems are 
essentially the same. 

The elements of the variance matrix of the estimates have also been approximated in large 
samples by Hald. Ourresults have been derived in essentially thesame way. He has tabulated 
the variances and covariances of * and o* for different values of 7). We have tabulated them 
for different values of p which is fixed in our case. The tables are therefore quite different. 

Our computational procedure for estimating ~* and o* is thus somewhat more rapid 
than that given by Hald. The equation equivalent to (4) has been overlooked by him. In 


his case it comes out as ae 
o2* —s 


dy 
With this known, the table given in this paper seems more convenient and it can also be 


used for obtaining the estimates in his case. His Table 3 is also not complete as it is given 
only up to p = 0-8. 





pt = T+ 


5. LINEAR ESTIMATES IN SMALL SAMPLE 


In §2 we have derived the maximum-likelihood estimates of ~ and o, and in §3 the asymp- 
totic variances of the estimates. These estimates are consistent and efficient, and so for large 
n the problem of estimation is completely solved, except for difficulties which may arise 
when the data are grouped. When 7 is small the maximum-likelihood estimates may, of 
course, be biased and the asymptotic formulae for variances and covariances do not strictly 
apply. Pending detailed investigation on these points and because of the general interest 
of the results themselves, we derive in this section ‘ best linear estimates’ which are unbiased, 
however small n may be. 


5:1. The best linear estimate 


Let 2,, %2, ...,% be the censored sample of size kout of the complete sample of size n. The 


(n—k) censored values are known to be greater than these values. Let the k observations 
Biometrika 39 18 
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be rearranged and relettered so that the rth smallest value amongst them is represented 


by %),- Thus Brig <Zejn <-> <Zph0- (22) 


The problem is to estimate from these k-order statistics the mean and standard deviation 
of the sampled normal population. 

Let 4; be the expected value of the ith-order statistics from a normal population with 
mean zero and unit standard deviation N(0, 1). If x;,,, be the ith -order statistic from N(, 7) 


E(x) _) = h+ OH; (23) 
Let the variance matrix of the order statistics from N(0, 1) be denoted by V, then o?V is the 


variance matrix of the order statistics from N(y,c). 
Now equation (23) fori = 1, 2,...,4 in matrix notation can be written as 


E(y) = B®, (24) 
T11n 1 fy 
where y= |7a.0|, B= l Mei, O= [*]. (25) 
: 4 . C. 
Xin Ll fy 


The vector y has a variance matrix o?V. The extended principle of least squares (Aitken, 
1934) states that the linear unbiased estimate of 6, with elements which have minimum 
variance, is the value of 8 minimizing 


Q = (y— B6)’ V-(y — B6), (26) 


where (y — B6)’ is the transpose of the matrix (y — B®), and V-" is the inverse of the matrix V. 
Differentiating (26) with respect to ~ and o and equating to zero, give 


6* = (B’V-"B) B'V-y, (27) 


where 6* is the best linear estimate of 8. 
The variance matrix of the best linear estimates of the parameters is given by 


var (6*) = o?(B’V-'B)-!. (28) 


The expected values and all the elements of the variance matrix of the order statistics 
from N(0, 1) are known up to the sample size n = 10. For different values of k = 2, 3,...,n, 
coefficients have been calculated from (27) to construct the best linear estimates of ~ and oc. 
In Table 3 values of the #;’s are given for each n = 2,3,..., 10 with k = 2,3,...,n—1, such 
that “* can be obtained from the equation 


k 
MY = ¥ Bity (k= 2,3,....0— 1; m= 2,3, ...510), (29) 


| 
Similarly, in Table 4 values of the y,’s are given for each n = 2,3,...,10 with k = 2,3,...,n 
and these give 


k 
OF = VVikijn (k = 2,3,...,2; n = 2,3,..., 10). (30) 
i=1 


Tables 5 and 6 give, in terms of o?, the variances of ~* and o* respectively for each of the 
above combinations of k and n. 

Godwin (1949a) has given the coefficients for constructing the best linear estimate of 
o from the normal population up to the sample size n = 1(. His is the special case of the 
above problem when k = n. His results, calculated in essentially the same way, tally very 
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well with those given in Table 4 when k = n. In Table 3 the case when k = n has been 
omitted, for in that case the best linear estimate turns out to be Z, as expected, and the 
values of all the /’s are 1/n. 


Table 3. Coefficients 2; in the most efficient linear estimate of the mean in samples 
of normal population (equation (29)) 





n k B, Be Bs By Bs Be b, Bs By 
10 2 —1-86335 2°86335 = — — — i -= — 
3 —0-65963 —0-21376 1-87339 a — — — — — 
4 —0-29233 -—0-07091 0-03062 1-33264 — a - -- — 
5 —0-12397 —0-00164 0-05496 0:09880 0-97184 os — — — 
6 —0-03158 0-03833 0-07071 0-09614 0-11853 0-70788 i — — 
7 0-02441 0-06362 0-08176 0-09612 0-10887 0-12075 0-50448 — — 
8 0-06047 0-08036 0-08995 0-09709 0-10371 0-10999 0-11603 0-34241 — 
9 0-08432 0-09211 0-09563 0-09860 0-10110 0-10357 0-10599 0-10854 0-21014 
9 2 —1-68676 2-68676 = os — ae ae — _— 
3 —0-56639 —0-15219 1-71858 = — — a — — 
4 -—0-22717 —0-02847 0-06448 —-1-19115 — — — -— _— 
5 -—0-07311 0-03153 0-08086 0-11998 0-84076 -- os —_ — 
6 0-01042 0-06591 0-09238 0-11324 0-13211 0-58594 a= _ — 
7 0-06025 0-08751 0-10063 0-11099 0-12039 0-12934 0-39089 —_ — 
8 0-09149 0-10175 0-10666 0-11057 0-11421 0-11764 0-12121 0-23647 —_ 
8 2 —1-49153 2-49153 — a — — — = —_ 
3 —0:-46317 —0-08554 1-54871 — — _ _ a — 
4 —0-15491 0-01761 0-10012 1-03718 _ — os — — 
5 —0-01672 0-06765 0-10840 0-14133 0-69934 — — — — 
6 0-05692 0-09621 0-11532 0-13090 0-14512 0-45552 = _- — 
7 0-09967 0-11387 0-12079 0-12649 0-13177 0-13698 0-27043 = — 
7 2 —1-27332 2-27332 — a — os — i — 
3 —0°:34745 —0-01345 1-36090 — -— a= = - — 
4 —0-07380 0-06772 0-13752 0:°86856 — — — — — 
5 0-04655 0-10721 0-13748 0-16260 0-54616 — oas — — 
6 0-10882 0-12955 0-13997 0-14874 0-15705 0-31587 a — — 
6 2 —1-02607 2-02607 — — ae ae — — — 
3 —0-21592 0-06485 1-15107 — — —_ =e 
4 0-01848 0-12261 0-17615 0-68276 — — — — — 
5 0-11829 0-15097 0-16803 0-18280 0-37990 — — — — 
5 2 -—0-74111 1-74111 os — — -~ —_ — — 
3 —0-06377 0-14983 0-91395 os — —- — —_— — 
4 0-12516 0-18305 0-21472 0-47708 =. a — —_ _ 


4 2 —0-40555 1-40555 —_ — a — — aa an 
3 0-11607 024084 0-64310 —_ — — = — mae 


3 2 0-00000 1-00000 —_ — _ — — ais sides 


If the observations are the & largest values out of n, namely, x), >%2),,>.--. >), and 
if the mean and standard deviation are to be estimated from these & order statistics, then 
the coefficients for constructing the best linear estimate of the mean will be identical with 
those given in Table 3. For the case of o the coefficients will be numerically the same 
but with opposite sign. 

5-2. An alternative linear estimate 


We have calculated the coefficients for constructing the best linear estimate of the mean 
and standard deviation from censored data up to n = 10. For » slightly larger than 10, the 


18-2 
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maximum-likelihood estimates might be still unsatisfactory, because biased. The variances 
and covariances of the order statistics from N(0, 1) are not available for n > 10—although 
we understand they have been computed at the National Bureau of Standards up to 
n = 20—and so such a table of coefficients could not at present be extended. An alternative 


Table 4. Coefficients y, in the most efficient unbiased linear estimate of the standard 


deviation in samples from a normal population (equation (30)) 


k Nn Ya Ys 

2 —1-86082 1-86082 —- 

3 —0-96246 —0-43570 1-:39816 
4 —0-65204 —0-31500 —0-15920 
5 —0-49191 —0-24909 —0-13605 
6 —0-39303 —0-20632 —0-11919 
7 —0-32526 .—0-17569 —0-10582 
8 —0-:27527 —0-15249 —0-09447 
9 —0-23643 —0-13335 —0-08522 
10 —0-20442 -—0-11715 —0-07626 
2 —1-80925 1-80925 — 

3 —0-93550 —0-40780 1-34030 
4 —0-63300 —0-29447 —0-!5472 
5 —0-47658 —0-23355 —0-i1808 
6 —0-37968 —0-19367 —0-10472 
7 —0-31289 -—0-16472 —0-09366 
8 —0-26326 —0-14211 —0-08408 
9 —0-22373 —0-12324 —0-07518 
2 —1-75016 1-75016 — 

3 —0-90454 —0-36896 1-27350 
4 —0-61096 -—0-27071 —0-10613 
5 —0-45862 -—0-21555 —0-09700 
6 —0-36376 —0-17876 —0-08808 
7 —0-29776 —0-15151 -—0-07964 
8 —0-24759 -—0-12945 —0-07131 
2 —1-68123 1-68123 — 
3 —0-86817 —0-32690 1-19507 
4 -—0-58481 —0-24284 -—0-07174 
5 —0-43696 -—0-19433  —0-07179 
6 —0-34400 —0-16098 —0-06808 
7 —0-27781 -—0-13510 —0-06246 
2 —1-59885 1-59885 — 

3 —0-82436 —0-27604 1-10040 
4 —0-55282 —0:20914 —0-02897 
5 —0-40969 —0-16846 —0-04061 
6 -—0-31752 —0-13856 —0-04321 
2 —1-49713 1-49713 _ 

3 —0-76958 —0-21212 0-98170 
4 -—0-51173 —0-16678 0-02740 
5 —0-37238 -—90-13521 0-00000 
2 —1-36544 1-:36544 — 

3 —0-69713 —0-12682 0-82395 
4 —0-45394 —0-11018 0-11018 
2 —1-18164 118164 oo 

3 —0-59082 0-00000 0-59082 
2 —0-88623 0-88623 


Ya 


1-12624 
— 0-04730 
—0-05014 
—0-05017 
— 0-04883 
— 0:04637 
— 0-04353 


1-06219 
— 0-:02549 
— 0-03330 
— 0-03632 
— 0-03698 
— 0-03593 


0-98780 
0-00023 
— 0-01320 
— 0-02001 
— 0-02296 


0-89940 
0:03213 
0-01144 
0-00000 


0-79093 
0-07395 
0-04321 


Ys 


0-92435 
0-01103 
— 0-00067 
— 0-00782 
—0-01207 
— 0-01432 


0-67095 
0-09007 
0-06246 


0-54481 
0-13856 


0-37238 


Ye 


0-75765 
0-04699 
0-03208 
0-02163 
0-01432 


0-67966 
0-06770 
0-04911 
0-03593 


0-58682 
0-09505 
0-07131 


lll1> 


0-61063 
0-07215 
0-05581 
0:04353 


0-52390 
0-09545 
0-07518 


bliile 


0-47465 
0-09370 
0-07626 


e° 
= © 


Noa 
no 
oc 


blllll 8a 
rs 
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Yo Yi0 
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set of less efficient linear estimates is therefore suggested when n is greater than 10 but is 
not large enough to use the maximum-likelihood estimates. The coefficients of these linear 
estimates are obtained by assuming the variance matrix of the order statistics from N(0, 1) 


to be a unit matrix. 
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Table 5. Variance of the best linear estimate of the mean in terms of o* for 
different values of n and k 


2 3 4 5 6 7 8 9 10 
1-12689 0-41734 0-23659 0-16641 0-13359 0-11666 0-10749 0-10250 0-10000 
103134 0-38536 0-22407 06-16291 0-13515 0-12140 0-11442 O-11111 
0-93100 0-35410 0-21376 0-16225 0-13988 0-12954 0-12500 
0-82636 0-32480 0-20714 0-16600 0-14942 0-14286 
0-71864 0-29985 0-20683 0-17688 0-16667 


061123 0-28393 9-21772 0-20000 
0-51299 0-28701 0-25000 
0-44867 0-33333 

0-50000 


-3 
to om aaeed 7%, 


Table 6. Variance of the best linear estimate of the standard deviation in terms 
of o* for different values of n and k 


k 
a 2 3 4 5 6 7 8 9 10 
10 0-74911 0°35389 0-22480 0-16131 0°12372 0-09892 0-08129 0-06806 -.0-05759 
9 0°74230 0:34941 0-22116 0-15810 0-12075 0-09605 0-07843 0-06501 
8 0-73417 0-34408 0-21679 0-15419 0-11707 0-09242 0-07461 
7 0°:72430 0-33753 0-21136 0-14928 0-11233 0-08750 
6 0-71193 0-32920 0-20436 0-14277 0-10570 
5 0-69571 0-31809 0-19476 0-13332 
4 0-67303 0-30208 0-18005 
3 063783 0°27548 
2 0-57080 
Let the linear estimates be 
k 
p’* = F Oz, (k = 2,3,...,n), (31) 
i=1 
k 
and o'* = YC 2ij, (k = 2,3,...,2), (32) 
i=l 
1 2u,-Z 
where b= 7- Finks Fx) (33) 
D (4: - Px)? 
i=1 
(4: — Mx) 
C= get ee, (34) 
& (4A)? 
is 
Pe 
and a x -- (35) 
i=1 


The ,,’s, as defined previously, are the expected values of the order statistics from N(0, 1). 
The coefficients for any unbiased linear estimate of the mean will satisfy the condition 
xb; = 1 and those for 7, Xc; = 0. These conditions are satisfied by the new coefficients. The 
linear estimates are therefore unbiased. The estimates given in (31) and (32) can be 
represented in matrix notation as 
p’*=b’y and o’*=c'y 

where b’ and c’ represent the vectors [b,, bg, ..., b,] and [c,, cy, ...,c,] respectively. In matrix 
notation the variances of the estimates can also be written as 


var (u’*) = o*b’Vb, (36) 
and var (o’*) = o*c’Ve, (37) 
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where b and ¢ are the transposes of the matrices b’ and c’ respectively. The variances of 
these linear estimates have been calculated for k = 2,3,...,n and n = 2,3,...,10. The 
efficiencies of these estimates, on comparing with the best linear estimates, are found 
to be very high in almost all cases. When k = n the estimate of o given by (32) for each 
n = 2,3,..., 10 has an efticiency of 99-9 %. Table 7 gives the variances and the efficiencies 
of the alternative linear estimate for the case n = 10 with k = 2,3,..., 10. The efficiencies 
are evidently very high in all cases. If the reader cares to derive 6; and c; from, say, Fisher 
and Yates’s table of u; for n> 10, Table 7 is the best guide available to the accuracy of the 
resulting estimates. 


Table 7. Variances and efficiencies of the alternative linear estimates of 
equations (31)-(34), for the sample size n = 10 








Variance of the estimate in terms of a? Efficiency of the estimate (%) 

Standard . Standard 

k Mean deviation Mean deviation 
2 1-12689 0-74911 100-0 100-0 
3 0-45613 0-38065 91-4 93-0 
4 0-26336 0-24796 89-4 90-7 
5 0-18261 0-17889 91-1 90-2 
6 0-14282 0-13638 93-5 90-7 
7 0-12149 0-10749 96-0 92-0 
8 0-10963 0-08642 98-0 94-1 
9 0-10310 0-07023 99-4 96-9 
10 0-10000 0:05766 100-0 99-9 


6. NUMERICAL EXAMPLES 


(a) The following distribution gives the lives in hours of 119 electric lamps out of 300. 
It represents a ‘tail’ of the complete distribution. 


Life in hours x Frequency 

950-1000 2 
1000-1050 2 
1050-1100 3 
1100-1150 6 
1150-1200 7 
1200-1250 12 
1250-1300 16 
1300-1350 20 
1350-1400 24 
1400-1450 27 
Total 119 


n = 300, k = 119; hence p = 0-3966, 
Z = 1304-832, s* = 12198-250 and d= 145-168 


Therefore y= 





g2 
8? +d? 


Table 1 gives the value of z = 0-7184. Hence 


o* = d/z = 202-06hr. 


and yu 


+ (o?* — 8?)/d 
1304-83 + 197-22 = 1502-05hr. 





a 








a 


—-—— _~e 
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Table 2 gives the value of 7,, and o,, and the variances of the estimates are calculated as 


gt 202-06? ; 
v2") = = 01 = —300 * 2-0185 
= 274-7059, 
ot* 202-062 
, * = no I ureoeoror_ 1-6¢ Z +8) 
v(o*) > 788 300 * 33 


= 222-3587. 


The estimated standard errors of “* and o* are therefore 16-57 and 14-91 hr. respectively. 
(b) The table below shows the day on which the first 7 of a sample of 10 tested mice died 
after being inoculated with a uniform culture of human tuberculosis. 


Days after inoculation 





41 44 46 54 55 538 60 Total 
Frequency 1 l 1 1 1 1 1 7 
1-613 1-644 1-663 1-732 1-740 1-763 1-778 
~~ ee] 





log days after inoculation 


In this example the estimates of the mean and standard deviation are obtained by three 
methods, viz. (i) best linear, (ii) alternative linear, and (iii) maximum likelihood. Since 
reaction time is not likely to be normally distributed, log,)z is taken as the variate, x being 
the day on which the mice died. The estimates are given in logarithmic scale. 

(i) Best linear estimate. The coefficients #,; and y; are taken from Tables 3 and 4 respectively, 
and they give: 


je* = 0-0244 log 41 + 0-0636 log 44 + 0-0818 log 46 + 0-0961 log 54 + 0-1089 log 55 
+ 0-1208 log 58 + 0-5045 log 60 


1-746; mean day = 55-7. 


o* 


— 0-3253 log 41 — 0-1757 log 44 — 0- 1058 log 46 — 0-0502 log 54 — 0-0007 log 55 
+ 0-0470 log 58 + 0-6163 log 60 
0-101. 


II 


The variances of the estimates in terms of o? are obtained from Tables 5 and 6 and they give 
s.E. of w* = ,0-1167 x 0-101 = 0-034, 
s.E. of o* = /0-0989 x 0-101 = 0-032. 


(ii) Alternative linear estimate. The coefficients b; and c; are calculated from equations 
(31) and (32) respectively and they give: 


ye 


\| 


— 0-0433 log 41 + 0-0491 log 44 + 0-1085 log 46 + 0-1568 log 54 + 0-2003 log 55 
+ 0-2425 log 58 + 0-2861 log 60 

1-748: mean day = 56-0. 

o’* = — 0-4077 log 41 — 0-2053 log 44 — 0-0752 log 46 + 0-0305 log 54 + 0-1258 log 55 
+ 0-2183 log 58 + 0-3136 log 60 

0-094. 











272 _ Estimation from a censored normal sample 
Table 7 gives the variance of the estimate in terms of a”, hence 
s.E. of w’* = /0-1215 x 0-094 = 0-033, 
s.E. of o’* = /0-1075 x 0-094 = 0-031. 
(iii) Mazximum-likelihood estimate. Here n = 10, k = 7, hence p = 0-7. 
s? = 0-002539 and d= 0-0734. 
Hence yy = 0-3227, and we find that z = 1-0192 from Table 1. The estimates are therefore 
p* = 1-742, o* = 0-072. 


The standard errors of the estimates are calculated in the same way as is done in example 


(a) and they are s.E. of u* = 0-024 and s.E. of o* = 0-021. 


7. CALCULATION 


The main bulk of the calculation in finding the coefficients of the best linear estimates lies 
in computing the inverse of the variance matrix. If we represent ,V, as the variance 
matrix, of order k x k, of the k-order statistics of the sample size n, and ,, V,_; as the matrix 
of order (k —j) x (k—j) obtained from ,,V, by eliminating the last 7 rows and columns of 
»V,, then ,,V,_; is the variance matiix of the (k —j)-order statistics from the same sample 
size n. Our problem therefore is to calculate ,,V;' for k = 2,3,...,n and n = 2,3,..., 10. 

The method due to Cholesky (1924) for calculating the inverse of a symmetric matrix 
is found to be the most convenient for the above purpose as it decreases the computing 
labour to a great extent. This method employs the fact that any positive-definite sym- 
metric matrix V can be expressed in the form V = T’T, where T is an upper triangular 
matrix and T’ its transpose. Then V-! = (T-")’ (T-!), where T-" is also upper triangular. 

Let o;,;, t,; and t'i be the elements of the ith row and jth column of the matrices ,V,, ,T, 
and ,,T;;' respectively. The elements t;; (j >i = 1, 2, ...,) of , T,, can be calculated from the 
elements o;; (i,j = 1,2,...,m) of the variance matrix by the formulae given by Cholesky. 
The elements ¢ (j >i = 1, 2,...,m) of , Tz? are preferably obtained column by column. As 
soon as each column is completed it can be checked by the following formulae: 


o,,(¢")? = 1 


and > o,,ti* = 0 (k = 2, 3, =, | 2 ‘ (38) 
=1 


An interesting point to note is that, if we represent ,T,1,; as the matrix obtained from 
nl,‘ by eliminating the last 7 rows and columns of ,,T;', then 


a ced = ivo-a ty pF f (39) 
Once the elements of ,,T;* have been ca!culated, in order to get the elementsof ,V;', we 
have to multiply ,,T;,' by its transpose. If we represent ,o” and ,,0% as the elements of the 


sth'row and rth column of the matrices , Vz! and ,,V;! respectively (r > s), we have, because 
of (39), 


n 


non = D Wer (40) 
j=r 
k . 
and ne = Dd ei. (41) 











3) 
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Thus while calculating the elements of ,V; we get also the elements of ,V;! for 
k = 2,3,...,(n—1). The matrix , V,, is symmetrical about both the diagonals. It can easily 
be proved that ,,V;,! is also symmetrical about both the diagonals. So when all the elements 
of ,V;,1 have been calculated this property serves as a check in the calculation of the 
elements, not only of ,,V;;1, but also of , V1 for k = 2,3,...,(n—1). 

The elements of the variance matrix of the order statistics from N(0, 1) are taken from 
Godwin (19496). In that paper the values are given to only five decimal places. Mr Godwin 
has kindly sent us his manuscript table which gives the results to eight significant figures 
up to the sample size n = 8. All calculations were done with two more extra figures than 
those given in Tables 3-6. They should all be correct up to the sample size n = 8, except 
for the rounding off error in the last figures given. When n = 9 and 10, there is good agree- 
ment between the results of Table 4 when k = 9 and 10 respectively, and those given by 
Godwin (1949a). The last decimal place here is not reliable. 


In conclusion, the author would like to express his indebtedness to Dr O. L. Davies for 
suggesting the problem; to Mr H. J. Godwin for kindly sending his manuscript table of the 
second-order moments of the order statistics; and he acknowledges gratefully the help 
and guidance which he received from Mr R. L. Plackett, under whose direction the whole 
work was carried out. He also wishes to thank Prof. E. S. Pearson for suggesting a number 
of improvements to the original draft of this paper. 
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USE OF SCORES FOR THE ANALYSIS OF ASSOCIATION 
IN CONTINGENCY TABLES 


By E. J. WILLIAMS 


Commonwealth Scientific and Industrial Research Organization, Melbourne 


I. INTRODUCTION 


A number of writers have discussed the analysis of data in the form of contingency tables, 
in which association is known to exist. A useful way of interpreting such association is to 
regard it as being due to the correlation between a pair of variates, corresponding to the 
classes of the two classifications of the table. According to one method, the contingency 
table is regarded as a frequency table for a sample from a bivariate normal population, 
classified into intervals of the two variates, the limits of which are to be estimated.* Each 
class thus corresponds to a range of values of the corresponding variate. The correlation to 
be estimated is the correlation in this population. Though this method may be appropriate 
in many situations where the assumption of normality is justifiable, it involves rather lengthy 
computations, and is not considered here. An alternative method, which has been widely 
used, relies on representing each class of a classification by a single value. For this method, 
two related approaches have been made; one, by Yates (1948), depends on the quantitative 
values for one or both classifications being given; for the other, adopted by Fisher (1940, 
1950), Maung (1941), Bartlett (1951) and others, the values of the two variates are so chosen 
as to maximize the correlation between them. These analyses have the advantage, first, of 
giving tests of association more sensitive than the overall y? test, and secondly, of providing 
actual numerical values characterizing the classes of each classification. These class values 
may often be given a practical interpretation. In the present paper, the two approaches 
and the assumptions underlying them are discussed in some detail. 

The method of analysis presented below is formally related to the analysis for the inter- 
pretation of interactions, discussed elsewhere by the author (Williams, 19526). It is shown 
that significance tests, developed for discriminant analysis and for the interpretation of 
interactions, exact when the variates involved are normally distributed, may be applied 
as tests asymptotically exact to contingency tables. 

Matters which are not essential to the main argument of the paper are dealt with in 
Appendices | and 2. 

Reference should be made to other workers whose methods are here adapted and extended. 
Lancaster (1949) has shown how x? for a contingency table may be partitioned into additive 
parts corresponding to comparisons which are uncorrelated and hence asymptotically 
independent. Yates (1948) has shown how an analysis in the form of a regression may be 
tested by a x? test, and Cochran (1950) discusses the relative merits of the y? and F tests 
applied to the analysis of discrete data. 

We consider here a contingency table where the data are classified by two attributes in 
p rows and q columns. For convenience, we assume that p >q. On the assumption that the 
two attributes are not associated, the expected frequencies in any row (or column) are 
proportional to the marginal frequencies. The assumption of no association may be other- 


* See the Editorial Note added at the end of the paper. 
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wise expressed, that the expected frequencies in the table form a matrix of rank 1. Owing 
to errors of sampling, the observed frequencies will depart from expectation; in general, 
they will form a matrix of rank at most g, and the departures from expectation a matrix 
of rank q—1. The object of the usual significance tests is to decide whether the departures 
are consistent with the assumption that the expectations are of unit rank. 

The significance of association can be tested by the usual x? test. When, however, associa- 
tion is assumed to exist (either as the result of a significance test, or, preferably, from prior 
considerations), it is of practical importance to be able to specify in what this association 
consists. The interpretation of the data is simplest if the association can be accounted for 
satisfactorily in terms of the correlation between a single pair of variates corresponding to 
the two attributes. Such a state of affairs will result when the observed frequencies are 
consistent with an expectation matrix of rank 2; or, what is the same thing, the departures 
from proportionality are consistent with an expectation matrix of rank 1. The higher the 
minimum rank of the matrix with which the observations are consistent, the more difficult 
the data are of interpretation. The association cannot then be interpreted in terms of the 
correlation between a pair of variates, and the existence of several such pairs, while satis- 
factorily representing the data, may be regarded as a rather artificial concept. The successive 
pairs of variates giving to the correlation a maximum value, stationary values in decreasing 
sequence, and a minimum value, are known as canonical variates (Hotelling, 1936). The 
number of such pairs of variates, both for the population and for any sample from it, is 
q—1. The number of population pairs corresponding to non-zero correlations is equal to 
the rank of the matrix of departures from expectation. For the present, however, signi- 
ficance tests will be considered which are appropriate when only one pair of population 
variates exists, as this case is of greatest practical importance. 

On account of discontinuity of the distributions occurring, an exact analysis is not pos- 
sible. The solutions presented here are approximate but, it is believed, adequate for practical 
purposes, especially with moderately large samples. 


II. THE ANALYSIS CORRESPONDING TO ASSIGNED SCORES 


When scores for one or both classifications are given in advance, the solution and significance 
tests have been given by Yates (1948). This case is discussed again here because it is pro- 
posed to generalize the scoring and to use significance tests slightly different from Yates’s. 
The difference would probably be negligible in most examples, but the tests given here have 
certain conceptual advantages, and lead naturally to the tests proposed subsequently for 
the concordance of a proposed set of scores with the data. For corresponding to any set of 
scores we need to test two aspects: the significance of the correlation between them, and the 
significance of the residual association not accounted for by the scores (see Williams, 
1952a). 
We use the following notation: 


n,; frequency observed in ith row of jth column, 

total frequency in ith row, 

n, total frequency in jth column, 

n. total frequency in table, 

£; score assigned to ith row of first classification, 

9; score assigned to jth column of second classification. 
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Since the test criteria are independent of the origin and scale of the two sets of scores, 
we may subject the £; and 7, to the following conditions: 


wm.84 = 0, 

X05; = 9, a 
and En 8 =n, 

2.515 =n... ” 


As a result of the conditions (1), the linear functions 


Uy by ~ 4515 


have zero expectation. The regression coefficient of the 7; on the £; (or of the £; on the 7;) 


is th 
is then r= EEims6 75/0... 


which, on the null hypothesis, has zero expectation. On condition that both sets of marginal 
totals are fixed and the null hypothesis holds, the variances of the above linear functions 
depend only on the marginal totals. Using conditions (2), we find the following values for 


the variances: 
n(n, —1.;) 





Vm 143 $4) = (n.. oe 1) ) (3) 
Nn; (n,.—7;.) 
ake 
while V(r) = Vz img8s7)/m*.. 
1 
ae ee (4) 


The test of significance for the association of the £; and 9,, given by Yates, is equivalent to 
testing a 9 
as x* with one degree of freedom. Since r is also a correlation coefficient, an alternative test 
is to treat 

(n.,—2)r? 


1—r? 
as F with 1 and n..—2 degrees of freedom. 

Both tests depend for their validity on an assumption of near-normality of the distribution ; 
moreover, they become equivalent when n._ is large. As Cochran (1950) has pointed out in 
a different connexion, either F or x? is in general an approximation to the true distribution. 
We shall discuss the two approximations in Appendix 1, but shall here give preference to 
the F-test, for the reasons set out in § V. 

We now consider the case in which only one set of scores is given (say, the £,), and the 
scores 7); for the other classification are to be estimated as values y, so chosen as to maximize 
the correlation between the two sets. This is analogous to a multiple corrslation with q—1 
independent variates. We have 

n® R? = (2 Limisb Ys)" (5) 
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and require to maximize, with respect to the y,;, the expression 
(XX 458 ys)? — 2k D myy;—A Xn 59}. 
& 73 7 
Then (x misbs) (2 2 Mink Yar) — KM. — AN 5Y; = 0. (6) 


On summing with respect to j, we get ame 


and on multiplying by y; and summing with respect to j, we find that 
nA = (LUNE y;)* 
=n, *R?, (7) 


Hence we have from (6), on substituting for A, 
259; = 2 myb/R. (8) 
This gives the values of the y; apart from a constant factor, which can be determined if 


required from conditions (2). To determine R? explicitly in terms of the £;, we square both 
sides of (8), divide by n_; and sum with respect to j. Then 


mR? = F(X 45 8;)?/m5. (9) 
§ -¢ 
The expected value of R?, on the assumption of no association, can be shown to be 
g-' 
n.—1° 


Again, alternative tests of significance for R? are possible. If we regard R as a multiple 
correlation coefficient, which it equals in variance, then 


(n,.—q) R? 
(q—1)(1—R?) 


may be tested by the F-distribution with q—1 and n.—q degrees of freedom. Also, as 
Yates shows, the above criterion is equivalent to a criterion for the homogeneity of the 
mean scores of the columns. We may then test 


n_ RR? 
as a x? with g— 1 degrees of freedom. 


III. THE PARTITION OF x? FROM A CONTINGENCY TABLE 


The analysis of the previous section shows how particular aspects of the departures of the 
observed frequencies from proportionality may be tested for significance: either a single 
comparison based on two sets of assigned scores, or a set of g — 1 comparisons among column 
means, based on scores for the rows. It is generally desirable, however, even when primary 
interest attaches to one set of comparisons, to see whether the remaining comparisons show 
significant departures from expectation. In other words, a x? corresponding to certain 
assigned scores having been partitioned off from the total y?, the remainder should also be 
tested for significance, as indicating possible shortcomings of the assigned scores in account- 
ing for the variation in the data. 
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As a preliminary to the analysis given below where the two sets of scores are estimated 
from the data, we now give the partition of the total x? into p — | parts, each of g — 1 degrees 
of freedom. This is a generalization of methods given by Lancaster (1949). The p— 1 parts 
are quadratic forms in the observed frequencies, additive to the total y? and, on account of 
the asymptotic normality of the distributions of the frequencies, asymptotically indepen- 
dent. Hence, they may for practical purposes be considered to indicate independent depar- 
tures from the expected frequencies. 

Consider any p— | sets of £,, denoted by ¢,, (wu = 2,...,p), such that 


D4. Sui Sei =0 (u + v),| 


(10) 
=n, (u= »).] 
Also let €4,= 1. 
To each set of £,, corresponds a x? with q— 1 degrees of freedom, viz. 
xX = 2 (DmisSus)*/M.5- (11) 


The value x? = n_,, corresponding to £,;, corresponds not to a comparison of frequencies 
but to their mean, and is included to simplify the discussion. 

Each y2 is a sum of squares of linear forms in the frequencies. It may be shown that the 
linear forms appearing in different x2 are uncorrelated, and hence asymptotically in- 
dependent. The x? also are therefore asymptotically independent. 

Moreover, as will now be shown, the sum of the x2, other than y?, equals the total x?. 
For consider the p x q matrix G, for which 





— ms (12) 
7 (n;.n.;)* 
The p x p matrix T = GG’ (13) 
I s ee ~ hj iF] 14 
1as hi b> (mp. 0; )t n;’ ( ) 
and is of rank q. Its spur is 
ni; 3 l 

=> ? —. a l. 5 

i aN; n- iy 
Now the p x p matrix S, with Sin = M bua (i6) 
where m; = (n;,/n,.)', 


is orthogonal. The transform of 7' by S is therefore S’7'S, of which the spur is found to be 





= +1. (17) 
n.. nN. 


Hence, it follows, on equating (15) and (17), that 
Pp 
= Xn = x’. (18) 


IV. THE ESTIMATION OF SCORES FROM THE DATA 


When neither set of scores is given, it is necessary to estimate them from the data. The 
maximum-likelihood estimates, depending ag they do on the non-null distribution of the 
frequencies when association is assumed present, are difficult to determine. Some considera- 


ee 

















») 


8) 
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tion is given to the non-null distribution in Appendix 2. However, for all practical purposes, 
and provided the sample size is sufficiently large, the maximum-likelihood estimates are 
approximated by the least square estimates, and these will be considered henceforth. 
We denote estimated scores by 
x, (¢=1,2,...,p), 


Y; -(j = 1,2,...,q), 


choosing origins and scales of measurement so that conditions (1) and (2) hold. 
Then the sample correlation between x and y, whose square is to be maximized for 


variations of the z,, y;, is R= Eayzay,ln.. (19) 
t 


As in § II, we find, on maximizing with respect to the y,, that 
n,.R? = u(% 44 X;)?/N. 5. (20) 
We now maximize, with respect to the x,, the expression 
~ (x 4;%,)*/n_;— 2 py N;,.L,—V 2M. 2. 
Then a(S yj Lp) Nyj[M_;— WN; — VN, x, = O. (21) 


On summing with respect to i, we find that ~ = 0. Multiplying (21) by 2; and summing 
with respect to 7, we find nv = T(Ynyxp)2/n., 
7 t 


=, Re. (22) 


Hence, from the set of equations (21), it is seen that R? is a latent root of the matrix T' of 
equation (13). 

The corresponding latent vector is {m ,x,;}. It may be shown that the latent vectors corre- 
sponding to different latent roots of 7' are unit vectors, uncorrelated in the sample. Since 
T is of rank q, it has q latent roots not identically zero. Of these, however, one is identically 
unity, and corresponds to a latent vector {m,}, that is, equal values of x,;. This latent root 
and its latent vector are not relevant to the test for association, and could be eliminated by 
taking departures from expectation of the frequencies in the table. The remaining latent 
roots will be designated, in order of magnitude, R?, R3, ...,R2_,. The largest of these gives 
the required maximized correlation, and its latent vector, the set of scores. In practice, 
these scores, giving the maximum correlation, are the ones which would be used. 

Since the procedure of calculation of latent roots and latent vectors has been described 
in this and other connexions by many writers, and is in any case the same as that used in 
discriminant analysis for normally distributed variates, it is not discussed further here. 


V. TESTS OF SIGNIFICANCE 


Various tests of significance have in the past been employed for testing the presence of 
association in the data and the adequacy of the sets of scores. Fisher (1940, 1950) uses 
analysis of variance to compare a sum of squares corresponding to the association being 
tested with a residual sum of squares; the number of degrees of freedom for association is 
equated to the number of independent constants fitted, even when the equations of estima- 
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tion are non-linear. Bartlett (1938) uses x? tests, and, in recognizing that these are only 
approximate, has developed adjustments designed to improve their accuracy. 

It is proposed here to adapt.significance tests (Williams, 1952a), known to be exact when 
the variation is normal, to data in the form of frequencies. The justification which will be 
produced is that the expected values of the elementary symmetric functions of the latent 
roots are, on the null hypothesis, the same for frequencies as for normal data. This is an 
extension of the result referred to for the case of a set of assigned scores, that the expectation 
of R? equals that of a multiple correlation coefficient. 


(a) Expected values of elementary symmetric functions of latent roots 


We denote the elementary symmetric function of degree s in the latent roots of T by a,. 


It is equal to the sum of the (? ) principal minors of order s. Moreover, since 7' = GG’, it 


may be shown by direct calculation that any principal s x s minor of 7' is equal to the sum 
of squares of all the minors of order s based on the corresponding rows of G. It therefore 


follows that a, is equal to the sum of squares of the (? ) (’) minors of G, of order s. 


In order to eliminate the irrelevant latent root unity, we take departures from expectation, 
given by the matrix G,=-G-G, 


where 93 = (n;.n_;)t/n... 
Similarly, if NV is the matrix of subclass frequencies, 

Ni = N-N, 
where Ni; = N,N,N... 
The rank of G, and N; is q—1. 

Consider any s rows and columns of N; for simplicity, these may be taken as the first s. 
The remaining p—s rows and g—s columns may be combined to give an (s+ 1) x (s+ 1) 
matrix, whose determinant is readily found to be equal to n., times the s xs minor of 
departures from expectation. The expected value of the squared (s + 1) x (¢ + 1) determinant 
is a symmetric function of its marginal totals, namely, 

Ny.» Te,, -++5 Me, (n.. =. eS ,.) 


and Ny, N.gy -++5 Ng, (N,—N..—Ng—... —N,). 


Moreover, if any of these values is zero, the determinant is identically zero. Hence the 
expected value of the squared (s+ 1) x (s+ 1) determinant has as factors all these marginal 
totals. 

Further, the expectation of any polynomial of degree 2(s+ 1) in the n,; is, in general, 
a rational fraction of which the denominator is 


n.(n,.—1)...(n,,—28—1). 
However, the expectation in this case is finite whenever n_ >s. Consequently only the 
factors 
m_(n,,.—1)...(m,,—8) 


are admissible in the denominator. 
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The expected value of the squared (s + 1) x (s + 1) determinant is therefore a multiple of 
Ny, Ng, ... Mg(M_— Ny, — Ng, — ... — Ng.) N.N.g--. N,N, —N—N g—...—N 4) 
nm, (n,.—1)(n_,—2)...(m_,—8) : 
and examination of particular cases shows that the required multiple is s!. 
The expected value of the corresponding squared s x s minor of N, is accordingly 








8! Ny, Mg, «..My.(N,,— My, — Ng, — --» — Mg) 1 Ng ++ M.g(N,, — Ny — M.g— ---— Ng) 
n®(n,.—1)(n,,— 2)... (n,,—8) : 
Any squared minor of N, is equal to the corresponding squared minor of Gz, multiplied by 
the product of the 2s marginal totals. Hence the squared determinant of the first s rows and 
columns of G, has expected value 


(23) 


s!(n_,—M,,— Ng, — ... — Ns.) (M,, —N. 4 —N.g—---— M5) 
n®(n_.—1)(n_,—2)...(n_,—8) i 





(24) 


Summing the expression (24) over all the (? ) (’) minors of order s, we find, for the elementary 


function of degree s, the expected value 
8 8 ; 
“a ates 
8 


This expected value is the same as is given by the normal distribution, when 


p— 1 = number of variates, 
q—1 = number of degrees of freedom between groups, 
n..— 1 = number of degrees of freedom within and between groups. 


(b) Test of significance for hypothetical scores 


It is known that, while the expected value of y? for a contingency table depends only on 
the total frequency for the table and is independent of the marginal totals, the variance 
(some discussion of which is given in Appendix 1) and other moments depend on the mar- 
ginal totals. Consequently, general results for moments above the first are difficult to obtain. 
In developing significance tests it seems therefore best to rely on mean-value results and the 
approach to normality. In particular, the results of the preceding subsection give some 
justification for adopting an ‘analysis of variance’ approach in setting up significance tests 
for a proposed set of scores. The significance tests appropriate to normal variation have 
been developed elsewhere (Williams, 1952a). Their application only is indicated here, in 
the simple cases q = 2,3. We may assume either the row-scores or column-scores given, and 
require to determine whether they are consistent with the data. More generally, we may 
require to determine fiducial limits for the population scores. 

q = 2. In this case there is but one sample latent root, 


Ri = x*/n_., 
corresponding to the total y?; the y-scores are fixed by the normalizing conditions, and the 
x; are proportional to 
img > 2 (to_-20) 
m;.\n, M2) 


Biometrika, 39 19 
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To any arbitrarily chosen set of scores £; corresponds an R? with one degree of freedom, 


namely, (X 41 8;)?/(m.1 2.2). 


The consistency of these scores with the data may be determined by testing the significance 
of R?— R? as a multiple correlation, using 


» _ (n,,—p+1)(Ri- R*) 
~ (p—2)(1— R34 RA 
with p—2andn_—j + 1 degrees of freedom, or alternatively, treating n_(R?— R*) as x? with 
p—2 degrees of freedom. 

q = 3. The matrix 7 has now two latent roots R? and R3, and in addition p—3 latent 
roots identically zero. The significance test for a given set of scores differs according as 
scores for rows or columns are given. This is because row scores may correspond to zero 
correlation, while column scores do not. 





When column scores 7; are given, the corresponding squared correlation coefficient may 


be designated R?. A test of the adequacy of these scores is given by the following analysis 
of variance: 








Degrees of freedom Sums of squares 
2_ p2) (p2_ p2 
Departures from assigned scores 1 (Ri ft BB) R:) 
Ri R3 
Departures from assumed rank p-2 <r" 
_ Rp) (1— R? 
Residual n,.—p-l : a —_ 
~ Total — na 2 eg. aes 


This analysis is formally equivalent to that given for testing a proposed discriminant 
function (Williams, 1952«). 

When row scores €; are given, the corresponding squared correlation coefficient, which is 
designated R?, takes values between 0 and R?. Corresponding to the latent roots R? and R3 
we have the sets of scores x,; and x,; respectively. In order to apply the analysis given above, 
it is necessary to make a projection of the €;-space on to the z,;—2,; plane. We put 


ry = Ln; 2, ,8;/n.. 





and To = XN; Lo, §,/n... 
Then it may be shown that R? = r? R3+73 R}. 
, 9 re R? +72 R3 
We now put Rf = _ ++ 
A+ 
so that R?> R?> R3 


The analysis now has two parts, as follows: 








Departures froin assigned scores Degrees of freedom Sums of squares 
' (Ri — R,*) (RP — Ri) 
In the 2,,;—2,; plane 1 (I= RS) (R2— BERD) 
‘ R?—-R? 
In the p—3 space p-3 oR 
27] — R®) (] — R2 
Kesidual a,.-p—1 R/(1 — Ry) (1— R3) 


2 Bt a (1— Rf) (RP — BYR) 
Total n.—-3 1 








m, 


ice 


nt 


is 
2 
R3 


ve, 





E. J. WrmuiAMs 








Departures from assumed rank Degrees of freedom Sums of squares 

’ 
: (Ri — R?) (R? — Ri) 
Scores in the z,;—2,,; plane 1 R3(1— R?) 
R? 
Rank p—2 ; = 
R, 
: (1 — Rf) (1 — R#) 

Residual e..—Pp—s her = re 

Total n_—2 ana 


It may be added that, when n_ is large, a y? analysis gives almost identical results but is 
much simpler. The above pair of analyses would be modified in the following way: 











Departures Degrees of freedom Sums of squares 
' Scores in the x,;—2,; plane I (Xi- xe =x) 
Scores in the p—3 space p-3 Bat . x? 
Rank »-3 a a 
~ Total departures ~ 2(p—2) — Mit+X%—X;- 
The use of these significance tests will be shown in the numerical example given below. 


VI. NUMERICAL EXAMPLE 


| The data of Table | are taken from Treloar (1939, p. 228). For an overall test of association, 
we have x? = 44-345, 
with 9 degrees of freedom, which is highly significant. A canonical analysis leads to the 
three correlations set out in Table 2. The degrees of freedom correspond in each case to the 
number of constants fitted, but are not proportional to the expected values of the com- 
ponents. When the variates are normally distributed, the expected values of the three 
components of a total sum of squares resulting from a canonical analysis are proportional 
to the values shown in the fourth column of the table. Such a comparison is useful when the 
hypothesis of no association is being examined; when this hypothesis is no longer accepted 
the comparison is not relevant. However, departure from the null hypothesis increases the 
expectations of all latent roots: if one population root exists, and is large, the expectations 
of the second and third components tend to the values shown in the fifth column of Table 2. 
Since the second and third components of x? are both less than their expectations, it may 
‘ be concluded that there is evidence for the existence of only one population root. 
Corresponding to the maximum correlation 0-56273, the values of the two sets of scores 
for periodontal condition and calcium intake are, respectively, 


az, = — 13880, y, =  0-8397, 
t_, = —1-0571, y,= 0-4819, 
t3= 0-6016, ys = —1-5779, 


a,= 09971, y,=—1-1378. 


These scores have zero mean and sum of squares 135. It is to be noted that, whereas the 
scores y, being positively correlated with severity of condition, would be expected to 
19-2 
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decrease with calcium intake, the scores y, and y, reverse the trend. This result is possibly 
due to sampling errors, together with the fact that a ceiling has been reached, above which 
increases in calcium intake have no effect on condition. That the difference between these 
two scores is not significant may be seen by combining the two classes, and recalculating. 
The overall yx? is now 42-864, with 6 degrees of freedom, the reduction being non-significant. 
The two canonical correlations for the new contingency table are given in Table 3, and the 
scores corresponding to the first canonical correlation are 


x, =—1-4072, y, = 08448, 
X_ = —1-0243, y, = 0-4906, 
%3= 05906, ys 4 = — 13557. 


t= 1-0054, 


Again, the comparison of observed and expected values shows only one significant latent 
root. 

It is now of interest to derive fiducial limits for the true scores. Since there is no indication 
of departure from the assumed rank, the significance test may be confined to the consistency 
of assumed scores with the data. If R? is the squared correlation coefficient corresponding 
to a given set of column scores, then we have, with 1 and 130 degrees of freedom, 


p — 130(R}— Re) (R3— Re) 
RAI — R}) (1— FB) 


_ 130(0-31145 — R2) (R2 — 0-00606 ) 
M2 0-68438R2 ‘ 








The 5 % point of F = 3-9140, corresponding to R, = 0-53889; the 1 % point of F = 6-8340, 
corresponding to R, = 0-52408. 

The fiducial limits for the scores for calcium intake for a chosen significance level will be 
the values of the scores corresponding to the above values of R,. 

We may, further, consider the concordance with the data of any chosen set of scores. 
For example, suppose 


Wy = Ne RE 0-7308 o*9 


Then R, = 0-55302, so that the chosen scores are not inconsistent with the data. 


Table 1. Periodontal condition and average daily calcium intake of 135 women 


Calcium per day (in g.) 





0-40 0:40-0:55 0-55-0-70 0-70 > Total 
a 5 3 10 11 29 
Periodontal |B 4 5 8 6 23 
condition \c 26 11 3 6 46 
D 23 ll 1 2 37 


Total 58 30 22 25 135 
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Table 2. Canonical correlations, and corresponding components of x”, together with 
values expected on alternative hypotheses about population root A, 





Degrees Expectations 
of c A a 
Component R freedom A,=0 A, large x? = 135R? 
First 0-56273 5 5+./3 = 6-732 — 42-74984 
Second 0-10869 3 2 2+ 47 = 3-571 1-59497 
Third 0-00045 1 2-3 = 0-268 2—47 = 0-429 0-00003 
9 9 a 44-34484 


Table 3. Analysis of 4 x 3 table obtained by combining last two columns of Table 1 


Degrees 
Component R of freedom Expectation x? = 135R? 
First 0-55807 4 5 42-04531 
Second 0-07785 2 1 0-81825 
6 6 42-86356 
APPENDIX 1 


Limiting values of the variance of x? for a contingency table 

Since the test of significance for x? from a contingency table is based on the approximation 
by a Type I or Type III distribution to the actual distribution, the value of determining the 
variance of x? lies not in providing a significance test but in checking whether the assumed 
approximation by a continuous distribution is satisfactory. For this reason, and because 
the variance of y? is in general difficult to compute, only upper and lower limits for the 
variance are determined here. 

For a p x q table, the assumed variance of x?/n__, regarded as a squared correlation coeffi- 


Wt 2(p—1)(q— 1) (m..-1- (p- 1) (G-1)) 
(n,,—1)?(n_ +1) 
while that of y?, from the Type III distribution, is 


2(p—1)(q—1). (A2) 
The actual variance depends on the values of the marginal totals, ranging from a minimum 
value when all the marginal totals are equal, which is somewhat less than the assumed value 
(A 2) to-a value which, for the degenerate table in which most of the frequency is con- 
centrated in one class of each classification, is O(n. ). 

We consider first the minimum variance, which occurs when all the row totals and column 
totals are equal. Clearly, this minimum is attained only when 7, is a multiple of p and of q. 
It can be shown that, in this case, since all the n; and _; are factors of n__, and provided that 
n,>3, E(x4/n®) is a rational fraction with denominator n_(n_,—1)(n,,—2)(n,,—3). Also, 
E(x?/n_.) has denominator (n_— 1), so that V(x?/n_) has denominator 


n,(m,,—1)?(n,,— 2) (n,,—3). 





(A1) 


Then from considerations of order of magnitude, the numerator in V(x?/n__) is a polynomial 
of degree 3inn,. When n., = p, the total frequency for each row must be 1, so that the sub- 
classes consist only of arrangements of 0’s and 1’s. Then x? takes only the value p(q—1), 
and its variance is zero. Hence n_ — p is a factor of the variance, and so, by symmetry, is 
n..—q. We then have 
V(x) = 2(p—1)(q—1)n.(n.-—p) (n..—9) (..— A) 
(n,,— 1)? (n_,—2)(m,,—3) 





(A3) 
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where A requires to be determined. From examination of particular cases, it appears that 
sain yoy = 2(P= WG-1) m,n. —) (n..-4) (As) 
= (n_ = 1)*(n,=3) 





This is generally, but not always, greater than the value derived from (A 1) but less than 
the value (A 2). 
The maximum variance, for given n_, p and qg, may be seen to occur when 


my, =n.—ptl, 
n; =1 (¢+1), 
n,=n,,—qtl, 
n;=1 (j +1), 
and its value, as may be verified, is then 
V(x") = (p—1)(q-1)(n..—p—q+3+0(n=")). (A5) 


This value does not tend to a finite limit as n.. increases. It appears that, for contingency 
tables in which some marginal totals are large compared with the others, the approximation 
by a continuous Type I or Type III distribution may not provide a satisfactory significance 
test. 
APPENDiX 2 
The contingency table distrilvction when association is present 


The simplest form of association occurs when the departures of the subclass probabilities 
from proportionality are of rank 1. Such association is most readily dealt with statistically, 
since there is but one canonical correlation in the population, and the association may be 
interpreted as a relationship between a pair of sets of scores. Only this case will be con- 
sidered here. 

The probability of class (i, 7) may be written 


Pig = P:.P.i(1 + PF 05), (A 6) 


in which the p,_ and p_; are the marginal probabilities. The significance of the other variables 
will appear later. Since 


L Pij ad a 
u 
LU Pi eit 
I 
we have at once 7:6: = 9 = Lys (A7) 
e I 


In studying the association in the table we require a distribution independent of the p;. 
and p_;. When there is no association, this is given by the distribution of subclass frequencies, 
conditional on the given set of marginal totals. When association exists, this conditional 
distribution is no longer independent of the p; and p_; on account of the relations (A 7). 
However, the distribution does show the effect of association on the statistics used in 
testing significance. 
The probability of the set of subclass frequencies is 
=! 


P({nj;}) = [Int LE Pit TT pis? TT +08 67,)"4. (A 8) 
ij 13° 7 tJ 
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We first require to determine the probability of the given marginal totals, and this is 
obtained by summing the expression (A 8) over all sets {n,;} giving these marginal totals. 
The sum will be denoted by P,(M). 

Now in the case of independence, i.e. when p = 9, it is known that 


PM) = rs, oy Hp TI pry. (A9) 


Hence it can be seen that the probability when association is present is given by multi- 
plying Pj(M) by a factor which is the expected value for zero association, and conditional 
on the marginal totals, of TT (1 + p&¢9,)"%. (A 10) 

wd 

Such a conditional expectation will be denoted by £,,. 

The expectation of the expression (A 1) is determined as follows. Consider a variate z, 
taking the value £ ;7; if a sample value belongs to subclass (7, j). Then the expectation of 
(A 10) is nN. 

; Ey 1 (1+p24) |. (All) 
a=1 


The coefficient of p” in the expectand in (A 11) is the elementary symmetric function in 
the z, of degree b, and hence is homogeneous, of degree b, in both the £; and the 7;. The 


expectation of this term is (";) times the expected value of the product of any b of the z,. 


Moreover, since the probabilities for rows and columns are independent, this expected 
value is the product of a function of the £; and a function of the 7;. These two functions are 
readily determined. For, putting all the 7; = 1 in (A 10), we see that the product becon::s 


I (1 +pé;)"-, (A 12) 
which is constant when the marginal totals are fixed and hence equals its expectation. The 


required function of the £; in the coefficient of p” is therefore the coefficient of p°/b! in 
(A 12), which will be denoted by =,. Similarly, the required function of the 7; will be the 





oeffici d/h! j 
coefficient of p°/b! in 1 (1 +p9,)", (A 13) 
d 
denoted by H,,. 
Finally, since =, = H, = n_(n_,—1)...(n,,—6+ 1) when all the €; and 7; = |. the coefficient 
of p®/b! is seen to be = 
Sy (A 14) 
n (n..—1)...(n..—b+1) 
For small values of 6, =, and H,, may be determined explicitly. Putting 
Xi iF un; si 
; (A 15) 


we have b &,, 


3 X3—3X1X5+2X%, 


etc.,- 
with similar formulae for the //,. 
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The =, may be successively calculated by means of the recurrence relation 
5, = X;5,_,—(b—1) X25,_.+...+(—1)-1b! X}. (A 16) 
It can be shown that, in general, the expected value of any symmetric function of the 
z is proportional to the product of the corresponding symmetric functions of the £; and 7;. 
It is convenient to express these functions in terms of the power sums X}, X,... and 
Y;, Yo, ..., a8 has been done with =,, H,,. 
It will be convenient to write 


de ~ X;.(E;— Xi/n..)%, 


(A17) 
Y, = En s(5— Yi/n..)%, 


Since the constant p is at disposal, we may standardize the scale of the £;, 7; by putting 
x,= 2—7n... (A 18) 


Then p may be identified as the correlation in the population, and the £;, 7; as the corre- 
sponding population scores. 





We have 
P,(M) = PM) By, 11(1 + pé 59,)"4 
os p=,H, PPE,H, pe=,H, a 
. Pau {1+ n. ‘2\m_(n.—1)* 3!n.(n.—l)(m.—2) "J" (A 19) 


Finally, the probability of the subclass frequencies, conditional on the marginal totals, is 
II ,.' II 7.;' TT (1 +08 59;)"4 
7 j uy 
m..\ TT n4;! | p=, Hi, p*=,H, pr=3H, | 
OE GT titan —1) tHe fn — I) (n 2) 
Where p is small, the factor introducing p into (A 20) is approximately 
l +A{X 056495; - 2.6: n;9;|n..}, 
) 





(A 20) 





so that a test of significance for the existence of p, corresponding to given scores £;, 1);, 
satisfying (A 7), is provided by 


r= 5 Eng(&-F4) (4-2). (A 21) 


However, there is no sufficient statistic for p, and, in particular, r is not efficient for the 
estimation of p unless p is small. 

The effect of association on the expected values of the statistics used in testing significance 
may be examined, using this distribution. When both sets of scores are given, the appro- 
priate statistic is r. Clearly, with z defined as above 





x XY! 
r=—- 
‘.. © 
Hence = E,,(r) = E,(z)- An 





= N p=, A, pr=,H, \_ 4%; 
= Bgfetl(1 +2) [1+ :: ieee -h* nt 
a, Le. _ PX ¥s—(n..— 2) (Xz Y,+Xj{¥,+2X; Yl, 

n —1 nm. (n,.—1)(,,—2) Fit 





(A 22) 
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In the same way, it is found that 


V(r) ‘an = 2 4 PL Xs¥s + 2(m.. — 2) Xj Yu, i 


1° n_(n_,—1)(n,,—2) (A 23) 





From the distribution given above, the expected values for x? corresponding to one set 
of scores given, and for total x*, may also be derived. Since the expressions are somewhat 
complicated, they are not reproduced here. 
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{Editorial note. The first method of measuring association in contingency tables referred 
to by the author on p. 274, extensively used by Karl Pearson and his collaborators in the 
Galton Laboratory, was put forward by Pearson in 1913 in Biometrika, 9, 116. Briefly, 
the basis of the approach was as follows. In many cases where data have been arranged in 
a two-way contingency table, the categories used do not correspond to a set of clearly 
differentiated attributes but provide a convenient classification of what are in reality con- 
tinuously varying characters. This, for example, is the case when individuals are classified 
into four or five broad categories according to health or intelligence. 

In such.cases the objective of the method was to obtain estimates, which would be con- 
sistent for different groupings, of the correlation between the underlying variables and also 
of their regressions. The procedure assigned a single ‘class mark’ to each category deter- 
mined from the marginal distributions, found the correlation of these class marks and then 
sought to allow for the broad grouping by applying an adjustment termed the ‘class-index 
correction’. The correction was derived on the assumption that the underlying variables 
were normally correlated, but Pearson gave reasons for believing that the same correction 
would still be roughly applicable even when the relations between the variables differed 
considerably from the bivariate normal form. 

The derivation of the correction was not mathematically rigorous, but some investigation 
in progress involving artificial random sampling suggests that when the normal surface is 
broken up into 10-20 cells in this way and the correlation estimated from samples using 
equation (xii) of K. Pearson’s paper, the results are reasonably consistent and unbiased 
for different groupings. ] 
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PROPERTIES OF DISTRIBUTIONS RESULTING FROM CERTAIN 
SIMPLE TRANSFORMATIONS OF THE NORMAL DISTRIBUTION 


By J. DRAPER 
University College, London 


1. INTRODUCTION 
1-1. Preliminary remarks 
This paper is concerned with some aspects of the systems of frequency curves proposed by 
Johnson (1949). The central idea in these systems is to transform the variable under 
consideration so that the transformed variable has a normal distribution; normal tables 
could then be used for the fitting of a frequency curve to the distribution of the variable. 


1-2. Outline of system 


Tt is supposed that the variable z is transformed to the unit normal variable z by means of 


z= y+df[(x—£)/A], (1) 


where y,6,& and A are parameters and f is a simple function of no variable parameters. 
Three forms of f are used. 


(a) If f is a natural logarithm, we are in effect using only three parameters instead of 
four as , 

z= y+ dlog[(x—£)/A] = y' + blog (x—8), (2) 

and the (f,,/,) points of these ‘log-normal’ curves are restricted to lie on a line in the 

/,, f.-plane. This marks the dividing line between the two main systems used. It is noted 


that x >, so that the log-normal or S; curves are bounded at one end and infinite at the 
other. 


(6) The transformation 
z= y+dsinh—"{(~—£)/A] = y+dsinh— y, (3) 
where y = (x-£)/A, (4) 


provides a system of curves whose (/,,/,) points completely cover that part of the /,. 
f,-plane ‘below’ the log-normal line, i.e. that area covered by Pearson Types IV, V, VII 
and part of Type VI. These curves have infinite tails and are termed the unbounded or 
S;- system. 

(c) The transformation 


z= y+élog[(x—£)/(+A—2z)] = y + Flog [y/(1—y)] (5) 


provides a system of curves whose (f,, f,) points cover that part of the /,, £,-plane between 
the log-normal line and the boundary, /,—/,—1 = 0, of possible frequency distributions, 
i.e. the area covered by Pearson Types I, II, III, and part of Type VI. WeseethatE<2<£+A 
or 0<y<1, so that these curves are bounded at each end, and termed the bounded or 
S, system. 


These three systems of curves S,, S,;, and S, together cover the entire ‘ possible’ region 
of the ,, P.-plane. 
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While these systems of curves can be employed for fitting observational frequency 
distributions (some examples will be given of this application), they are perhaps of greater 
value in representing sampling distributions for which certain moments only are known. 
The S,, system seems likely to prove useful in these cases as an alternative to the Pearson 
Type IV curve which, comparatively, is more difficult to handle. Investigations in progress 
in the Department of Statistics, University College, London, indicate that S,, curves fit 
certain sampling distributions remarkably well. As an example, we show the results of 
employing this curve to represent the distribution of non-central t, used in determining the 
power function of the t-test. If y is a normally distributed variable with expectation zero 
and variance ¢?, and if s? is an independent mean-square estimate of 72 based on v degrees 
of freedom, then t’ = (y+ A)/s, follows the non-central t-distribution, which depends only 
on v and the ratio A/a,. Each of the approximate results shown below was obtained from 
a separate S,, distribution having the correct first four moments of the appropriate non- 
central t. The table shows the chance of establishing significance at the 24 % level (single- 
tailed test) when v = 9, for eight different values of A/c,. 


Power of t-test with 9 degrees of freedom* 


Approximate 
A/c, True power value 
0-350 0-050 0-050 
0-756 0-100 0-100 
1-605 0-300 0-296 
2-195 0-500 0-500 
2-789 0-700 0-703 
3-651 0-900 0-904 
4-066 0-950 0-951 
4-849 0-990 0-987 


* I am indebted to Mr K. Dennis of the Department of Statistics, University College, London, for 
allowing me to quote these results. 


1-3. Scope of present paper 

In this paper we are chiefly concerned with the problem of estimating the parameters 
y,6,€ and A from the observed moments. Once the four parameters are known, values of 
the probability integral corresponding to given values of x can easily be obtained using 
tables of natural logarithms or hyperbolic sines, and the normal probability function. 

Calculation of the parameters of a log-normal curve presents no difficulty using the first 
three moments of the distribution of x. In Johnson’s paper, the parameters of an S,, curve 
were found using an abac. In the present paper, simple approximate algebraic expressions 
are found to give the parameters in terms of the first four moments of the observed variable 
for S,, curves. Certain other aspects of S,, curves are also considered. A similar investigation 
has been carried out for S, curves using methods outlined in §3. It is hoped to publish fuller 
details of the results of this work in a later paper. 


2. Firrine Sy CURVES 
2-1. Introduction 
Johnson (1949, pp. 163-4) deals with the moments of the S,, curves. From (3) we see that 





LAy) = Taal e-t2* 9-7 {eles — eer} dz, 
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from which it follows that the first four moments of y are 
wy, = —otsinh Q, 
He = 4(w— 1) (wcosh 20 + 1), 
Hs = — fwt(w— 1)? {w(o + 2) sinh 30 + 3 sinh QO}, 
My = }(w— 1)? {w?(w* + 208 + 3w? — 3) cosh 40 + 4w?(w + 2) cosh 2Q + 3(2w + 1)}, 


where w = e®*,Q = y/é. 


(6) 


Johnson computed extensive series of values of #, and f, for varying y and 4, and plotted 
an abac showing the lines on which 6 = constant and y = constant in the /,, £,-plane. The 
abac covers that part of the plane given by 0</,< 1-2, 3-:0</,< 45-0, although a more 
extensive one is available (Johnson, 1948). In fitting a curve, values of y and é could be 
read off from the abac corresponding to the £, and f, ratios of the observed variable. 
44(y) and o(y) could then be computed from (6), and as we see from (4) that 

a(x) = Ao(y) 
and Hy(2) = E+ Any(y), 
£ and A could be found using the first two moments of the observed variable. 

It was felt that an algebraic method of finding y and 6 would be preferable to this 
diagrammatic method because: 

(a) visual interpolation is necessary with its attendant inaccuracies; 

(b) mistakes can easily be made in reading along the diagonal lines of the abac (Johnson’s 
fourth example (p. 171) gives 6 = 3-55, y = 2-13 whereas the correct values as read from the 
abac are 6 = 3-40, y = 1-70). 

The ideal method of determining y and 6 would be to invert expressions (6) to give w and 
Q in terms of f, and £,, but as this was found impracticable simple expressions for y and 
6 were sought empirically. 

2-2. Approximate formula for 6 


It is seen from the abac that the lines 6 = constant are almost straight and parallel. Hence 
for each 6 = constant, £,—3-—m/f, is approximately constant whatever the value of Q, 
where m is the common slope of the lines. A mean value of m = 1-376 has been taken. Con- 
sequently there is an approximate functional relationship between d and 0 = £, —3—1-376/, 
independent of 2. The form of the graph of 6 plotted against 0 suggested the transformation 
w = e* * to obtain a straight line. The graph resulting from this transformation is slightly 
curved, however, so that a parabola was needed to describe the points adequately. The 


Curve fitted is = 1+ 0-226750 — 0-0252562, (7) 
where w = e** and 0 = £,—3—1-376f,. Hence é = (log)-+ is immediately available. 


2-3. Approximate formulae for y 


The lines Q = constant of the abac are almost straight and pass through the point 
(£, = 0,8, = 3). This suggests that for each line Q =: constant, 


? = B,/(be- 3) 


is approximately constant, whatever the value of 3. Therefore there is an approximate 
functional relationship between Q and ¢. Values of ¢ were measured from the abac and 
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plotted against Q. The shape of this graph suggested the use of log {¢,/(¢,—¢)}, where 
¢y is the approximate value of ¢ on the log-normal line. {Note that dQ/d¢ = «0 at ¢ =¢,.) 
The graph of Q plotted against log {¢9/(¢)— ¢)} is adequately described by a straight line 
for the range of values 0:2< Q< 1-5, i.€6. 0-1<¢<0-52. Below this range, i.e. for ¢< 0-1, 
a simple power curve was fitted to the points of Q plotted against ¢. Above this range, 
i.e. for ¢ > 0-52, (8,, 8.) points lie almost on the log-normal line and in practice a log-normal 
curve would be fitted. Hence we have: 


for f,/(_—3) < 0-1, Q = 0-7739¢°5473, (8) 
for 0-1 < fy/(f_—3)< 0-52, 

Q = 01456 + 0-4436 log, {¢o/($5— 9)}, (9) 
where $ = B,/(2—3), Po = 0-545; 


for 0-52 < £,/(£,—3), fit a log-normal curve. ‘ 
Having found 6 previously, we immediately have ? = 6Q. 
2-4. Verification of results 
To check the accuracy of formulae (7), (8) and (9), various values of w and Q were taken 
and the corresponding values of f, and £, for each pair of values calculated from 
_ ww —1){o(w + 2) sinh 3Q + 3sinh Q} 
* 2(w cosh 2Q + 1)8 ‘ 


_ w%(wt + 20 + 3w*? — 3) cosh 4Q + 40*(w + 2) cosh 2Q + 3(2w + 1) 
iz 2(w cosh 2Q + 1)? ‘ 





A, 


Bs 





These values were substituted back in (7), (8) and (9) to obtain the estimates 0 and dG. 


A few of the results, together with the corresponding values of f£, and f,, are given in 
Table 1. 


Table 1 
w el 1:3 15 
Q 
0-1 Bb, 0-0103 0-0400 0-0835 
Bp: 3-4560 4-6744 6-4038 
PD) 1-0953 1-3010 1-4726 
a 0-0973 0-1003 0-1017 
0-5 B, 0-2182 0-8216 1-6719 
Bs 3-7386 5°7728 8-7534 
P) 1-0946 1-3044 1-4820 
Q 0-4921 0-4936 0-4836 
1-0 A, 0-5758 2-0628 4-0282 
By 4-2268 17-5623 12-3118 
PD) 1-0938 1-3160 1-4961 
Q 1-0214 0-9306 0-8459 


It is noted that for the majority of curves met in practice the estimates © and Q are good, 
but that for large values of f, and £, the error in Q is considerable. The reason for this is 
that the points involved fall well outside that area of the /,, £.-plane covered by the abac, 
and the lines 6 = constant and Q = constant are here distinctly curved. Hence the straight- 
line approximation is invalidated. 
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It might also be noted here that in practice even fairly large errors in y and 6 do not 
affect fitted frequencies appreciably, so long as the values of £ and A calculated from the 
first two moments are accurate. bi 

Correction factors to be added to @ and Q were obtained empirically and are given in the 
next section. They are such that the new estimates 6’ and Q’ are accurate to at least two 
figures for the range of values covered. 


a A | 1 \J 
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Fig. 1. System Sy. 
2-5. Summary of results 
To estimate 6 ) = 1+0-226750 — 0-0252502, (7 bis) 
where w=e*, 0 = £,—3-—1-376f, 


Correction £,, to be added to 4 to give ©’ is obtained from Fig. 1. Then e = (log,’)-+. For 
values < 1-2, #’ is correct to three figures; for greater values it is correct to two figures. 


To estimate y 


(i) For A,/(2,—3)< 0-1, Q = 077394057, } (8 bis) 
ys is 
or logy, £2 = —0-1113 + 0-5473 log, ¢, 
where Q=y/8, $= A,/(b,—3). 
No correction is needed; Q correct to two figures. 
(ii) For 0-1 < f,/(2,—3) < 0-52, 
QO, = 0-1456 + 0-4436 log, {b5/(¢9— 9)}, (0 bis) 
= 0/1456 + 1-02141log,9{$o/($o— $)}; 


where bo = 0-545. 
Corrections E, to be added to Q to give 0 are as follows: 
(a) For re} < 0:5, 
Eg 1 = (—0°1800 + 118250 — 1-742202) + w(0-0545 — 0-49380 + 0-871302). (10) 
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(6) For Q>0-5, 
Eq» = (— 0-306 + 1-4775Q. — 1-819702) + w(0-3567 — 1-5438Q + 1-760002). (11) 


(Y correct to two figures. 

(iii) For 0-52 < £,/(f,—3), a log-normal curve is fitted. 

Then )’ = 80’. € and A are calculated as indicated previously. 

It must be noted that these formulae cannot be applied to all values of £, and /,. They 
can be used well outside the range of values of £, and £, covered by the original abac 


(2, < 1-2, 3-0< 8, < 5-0), but have been checked only in that region considered most likely 
to be met in practice. 


The limits for estimating Q are indicated in formulae (8) and (9). 
Formula (7), to estimate 3, has only been checked for w < 1-5 and should not be used for 
values greater than this, i.e. as # = 1-5 corresponds to a mean estimate of © = 1-4854, we 


must apply the limit 1-4854 > 1 + 0-226750 — 0-0252502 


or 6? — 8-98026 + 19-224>0, 
and as = £,—3-—1-376/,, this gives 

Bo— 13762, < 6-52. (12) 
Outside this limit formula (7) should not be used. 

It is noted that to employ the correction formulae (10) and (11), w is assumed known. 
Hence w will have to be estimated first from (7) and corrected using Fig. 1 before the 
corrected estimate of Q is found. 

2-6. Examples 


The same lata are used in these examples as were used by Johnson. These distributions 
of length and breadth of beans were originally fitted by Pretorius (1930) with Pearson 
Type IV curves. The distribution of bean length (Table 2) falls off so steeply at the upper 
terminal that the method of moments will in this case only provide a first approximation 


to the parameters. The example, however, illustrates the procedures described in the 
earlier sections. 


Example 1. Length of beans 
The constants of the observed data are: 
Mean = 14-:399mm.; /, = 0-829. 
Standard deviation = 0-9036mm.; £, = 4:863. 
Now 4 = £,—3-— 1-3768, = 0-722. Hence from (7), 


es = 14+ 0°226750 — 0-0252562 = 1- 15, 
and therefore 6 = 2-67. 


Also ¢ = £,/(£,—3) = 0-445. Hence from (9), 
Q, = 0-1456 + 1-0214 log,, {0-545/(0-545 — g)} = 0-90, 
and therefore d= 8Q = 2-40. 
From (6) we calculate 4{(x—£)/A} = —1-0976, ¢(2/A) = 0-5861, 
whence A = 0-9036/0-5861 = 1-5416, & = 14-399 + 1-6921 = 16-0911. 
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An Sy curve was fitted to the data using these uncorrected values of the parameters. 
The fit was not expected to be good without making corrections to the estimates of y and 
6, but the example is given to illustrate the procedure. 

For comparison, the values obtained using the abac are 


6 = 2-64, y= 2-38, A=1-5192, & = 16-0745. 


Also for comparison we give the values of £, and f, obtained by substituting in (6) the values 
of y and é obtained both from the formulae (7) and (9) and from the abac. 


é é A, bs 
Observed — _ 0-829 4-863 
Formulae 2-67 2-40 0-808 4°813 
Abac 2-64 2-38 0-855 4871 


Table 2 shows the observed frequencies, the frequencies calculated from the fitted S,, 
curve both by abac and by formulae, and the frequencies calculated for the Type IV curve 
fitted by Pretorius. 


Table 2. Length of beans 





Length (central Observed Su Su 
values in mm.) frequencies (formulae) (abac) Type IV 
< 9-25 — 2-5 2-6 1-9 
9-5 1 2-6 2-7 2| 
10-0 7 5-5 5:8 5-4 
10-5 18 11-7 12-1 11-3 
11-0 36 25-2 25:7 24-2 
11-5 70 54-3 55-2 52-5 
12-0 115 117-0 118-0 113-8 
12-5 199 247-7 249-3 243-7 
13-0 437 507-8 508-7 503-6 
13-5 929 971-0 970-6 968-9 
14-0 1787 1640-7 1642-5 1638-9 
14-5 2294 2236-7 2240-6 2229-8 
15-0 2082 2128-3 2130-3 2132-6 
15-5 1129 1157-2 1151-5 1181-6 
16-0 275 296-0 290-1 299-3 
16-5 55 33-6 32-2 28-5 
17-0 6 2-1 2-0 1 
> 17-25 ‘ins 0-1 oy ¢ 
Total 9440 9440 9440 9440 
x? (total) —_ 69-9 87-1 102-5 
x? (partial)* — 52-2 66-3 70-1 








* This row shows the contribution to the total x? from groups with length of bean below 16-25 mm. 


Example 2. Breadth of beans 
From the observed distribution: 


Mean = 7-9755mm.; /, = 0-1943 
Standard deviation = 0-3399mm.; /, = 3-6544. 
Now @ = £,—3-— 1-376f, = 0-387. Hence from (7), 
© = 140226750 — 0-025250? = 1-084, 
From Fig. 1, the correction to be added is 0-0056. Hence 6’ = 1-09 and therefore 8’ = 3-40. 
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Also ¢ = £,/(£,—3) = 0-297. Hence from (9), 
Q, = 0-1456 + 1-0214 log,, {0-545 /(0-545 — 0:297)} = 0-4949. 
From (10) the correction to be added is 
(—0-1800 + 1-1825Q — 1-742202) + 1-09(0-0545 — 0-49380 + 0871302) 
3 2 = — 0-0214+ 1-09 x 0-0235 = 0-0042. 
Hence Q’ = 0-50, and therefore p’ = 6’Q’ = 1-70. 
From (6) we calculate 
6{(x—£)/A} = —0-54415, o(x/A) = 0-34809, 
whence A = 0:9765, §& = 8-5069. 
These values of the parameters are exactly the same as found by reading from the abac. 


Again we compare the observed moment ratios with those obtained by substituting 
y and é in (6): 


Y 6 Ay bs 
Observed — — 0-1943 3°6544 
Formulae and abac 1-70 3°40 0-1925 3°6530 


Table 3 compares observed frequencies, frequencies of the fitted S,, curve and those 
of the Type IV curve fitted by Pretorius. 


Table 3. Breadth of beans 


Breadth (central Observed Sy 
values in mm.) frequencies (abac and formula) Type IV 
< 6-25 ao 1-2 
6-375 4 ss “ss 
6-625 10 13-5 13-3 
6-875 72 50-7 49-9 
7125 170 178-4 177-2 
7-375 530 557-1 557-9 
7625 1397 1408-1 1413-3 
7-875 2579 2530-6 2530-5 
8-125 2742 2738-6 2732-5 
8-375 1483 1515-7 1515-4 
8-625 400 390-0 393-6 
8-875 48 48-8 48-6 
9-125 5 3-6) 
> 9-25 io 0-2) 3-0 
Total 9440 9440 9440 
7 — 13-63 14-36 


It is interesting to note that when Johnson originally fitted an S,, curve to the data of 
Example 2, using incorrect values of 5 = 3-55 and y = 2-13, the resulting frequencies were 
not much different to those obtained using correct parameters, though y*? was increased 
from 13-63 to 17-47. Hence it seems that for the practical task of graduating given data the 
prime consideration is the accurate determination of £ and A from the first two moments. 


2:7. Note on skewness 


From equations (6) we see that if y is positive, the inequalities mean < median < mode 
hold, while if y is negative the direction of the inequalities is reversed. That is to say, y is of 


opposite sign to the skewness of the curve; y is negative for a positively skew curve and vice 


Biometrika 39 20 
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versa, This is important when substituting values of y and 4 in equations (6) to obtain 47;(y) 
and o(y). For a positively skew curve 4;(y) is positive; hence an incorrect value for £ will be 
obtained if the wrong sign of y is used. 
For the S, system the opposite holds, i.e. y is positive for a positively skew curve and 
vice versa. 
2-8. Approximate normalization of the t-distribution 


Anscombe (1950) has stated that if ¢ follows Student’s distribution with n degrees of 
freedom, the transformed variable 


2 
y = t+sinh"! (A (13) 


is approximately normal with variance 3/(2n— 1). 

This is an S, transformation, so we will compare this form with the result of fitting an 
Sy curve by moments. 

If ¢ follows Student’s distribution, with n degrees of freedom, 


Hi) =0, o= [*5, Al) =0, Alt) = (n—2)/(n—4). 


Since £,(t) = 0, y = 0 and hence Q = 0. Putting Y = (t—£)/A, from (6) we have 
fo Y) = }(w— 1) (weosh 2Q +1) = }(w?-1), w{(Y) = —otsinhQ = 
and since fa(t) = A%u(Y) and p(t) =&+Apn;(Y), 
A? = 2n/{(n—2)(w*-1)} and £=0. 


Hence the S,, transformation (fitted by moments) applied to ¢ gives the following result: 


2 = dsinh- (| nae 2) (et = ) (14) 


is approximately normal with unit variance. Applying formula (7) to estimate 6 we have 
0 = B,(t)— 1-376f,(t)— 3 = 6/(n—4), 
and so e&*= w = 1+ 0-22675{6/(n — 4)} — 0-02525{6/(n — 4)}2. 
To compare transformations (13) and (14) we have to compare 


1 (n—2)(w?—1) 





1 
and 3 log w with 3/(2n—1). 
Value of A 
A-? = (n— 2) (w? — 1)/(2n) 
(n — 2) (=. 36 ) ( 6 36 
= -—— ~— 022675 — - 0-02525) {2+ ——0- - . 
— oe a —#? 5 | —. 22675 n—ae? 02525) 
nfbnes EE, 
2n n n? ; 


which is to be compared with Anscombe’s value of 3/2n. 








/) 


of 


4) 
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Value of 6 6% = (logw)— = flog (1+ X)}-, 
where X = o 0-22675 — an pe 02525 
= (x-F+3-..)" 
= =n 1-949 +22 +0(3) 


which is to be compared with Anscombe’s value of (2n — 1)/3. 

More important than these comparisons, however. is the comparison of the percentage 
points as given by (13) and (14) with the true percentage points as given by the f-distribution. 
This is easily done, using normal tables and tables of ¢. Thus we find the following percentage 
points (two tails), the results obtained by fitting an S,, curve by moments being given under 
(14), the results obtained by using Anscombe’s formula under (13), and the correct values 
under ¢. 








n 50 % 20 % 5% 1% 
(14) (13) t (14) (13) t (14) (13) t (14) (13) t 
5 0-736 0-729 0-727 1°:502 1:478 1-476 2-608 2°536 2-571 3-995 3-833 4-032 
10 0-696 0-700 0-700 1-371 1-372 1-372 2-238 2-220 2-228 3-189 3-129 3-169 
20 0-686 0-687 0-687 1°325 1°325 1-325 2-088 2-084 2-086 2-850 2-836 2-845 
30 0-683 0-683 0-683 1-310 1:310 1-310 2-043 2-041 2-042 2-753 2-746 2-753 


As is seen from this table, both (13) and (14) give close approximations to the true values, 
Anscombe’s formula being better in the main part of the distribution and the transformation 
obtained by fitting an S, curve by moments being rather better in the tails. 

Equation (13), of course, can be used for degrees of freedom less than 5, but in this case 
(14) cannot be used when estimating é by (7), because when n = 4 the 6 of formula (7) 
becomes infinite. 


3. Firtine S;, CURVES 
3-1. Introduction 
The S,, transformation as given in (5) is 
z= y+délog{y/(l\—y)}, where y = (x—§)/A. 


As 2 is a unit normal variable, the rth moment of y about zero is 


By) = jen) e-t2*( 1 + ee-18) dz, (15) 


This integral is not easy to evaluate directly; but Johnson has evaluated j;(y) directly and 
the higher order moments are found from the relation 
3d Om, 


Pra = Beta 





The computational labour involved in this method is considerable. and values of the first 
four moments were calculated for only a few values of y and 6. 

We will now describe a method of calculating moments of S,, curves using a formula due 
to Goodwin (1949). The numerical results of calculations based on these formulae will be 
given in a later paper. 
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3-2. Moments of the Sz system 
Goodwin (1949) has proposed a quadrature formula for the evaluation of integrals of 
the form |" f(x) e* da. 
His sonaih in 


| ” fayede=h ES f(nbye™™— Bid). (16) 


If, without loss of generality, it is assumed that f(x) is an even function of x, the error ‘srm 
E(h) is given by " 
E(h) = ae [ f(y—infh)e-“ dy. 


This error term takes a very simple form if we assume that its main contribution comes 
trom the neighbourhood of y = 0; in this case 


E(h) +2 Jae" f(in/h). (17) 


This will seriously underestimate the error term only if f(x) tends rapidly to infinity with x. 
We see that formula (16) is immediately applicable to the evaluation of Sz, moments if 
we put (15) in the form 


BAY) = =| e-? (1 + e7-V 2b) 7 dy, (18) 
where /2w = z and (1+¢7-V2™*8)- = f(w), i.e. applying (16) we have 
pity) = S_ (+ e-vtahty-re- a Bh) (19) 


where £,(h) is the error term for the rth moment. 

Now f(w) of (18) is an odd function of w; so for the evaluation of Z,(h) we consider 
F(w) = 4(f(w)+f(—w)) instead of f(w). F(w) tends to 4 as w tends to infinity and therefore 
we may apply (17). Hence the error term of (19) assumes the form 


E,(h) =e tied ||| .s elhy—V2 iny/sh)—r - (1 +. ethy+Vv 2émysh)—ry 
We are interested in the first four moments and for r = 1, 2,3, 4, H,(h) can be expreseed as 


E,(h)=2e-""* C/Dr  (r = 1, 2,3, 4), 
where 


D = 1+ 2e7 cos ,/2 1/dh +e”, 

C, = 1+e”* cos /27/édh, 

C, = 1+ 2e7* cos /27/dh + e?”* cos 2 /27/dh, 

C, = 1+ 3e7* cos /27/dh + 3e?”* cos 2 /27/dh + e374 cos 3 /27/dh, 

C, = 1+ 4e”* cos /27/dh + 6 e?7" cos 2 2 7/dh + 4€3”* cos 3 /27/dh + 4” cos 4/2 7/dh. 
We see that £,(h) tends rapidly to zero with h, and in fact for the values of h actually used 


in evaluating moments it was negligible. 


3:3. Bimodal boundaries 
Johnson has found the necessary and sufficient conditions for an S,, curve to be bimodal. 


They are 8<1//2, | y| <[v(1— 26) — 26tanh- y(1 — 26%)]/6. 








ww 
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Using the second condition and replacing the inequality by an equality sign, pairs of values 
of y and é were found, and the corresponding values of 4;, 7, 8, and £, computed using the 
quadrature formula (19). 

These figures were used to plot the line in the ,, 8,-plane ‘above’ which S, curves are 
bimodal; this is shown in Fig. 2. For comparison, the same figure also shows, as a broken line, 
the curve ‘above’ which Pearson Type I curves are U-shaped and the curve ‘above’ 
which all continuous distributions are bimodal—the ‘absolute bimodal boundary ’—which 
has been discussed by Johnson & Rogers (1951). 
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Fig. 2. Bimodal boundaries. 
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ON A TWO-SIDED SEQUENTIAL t-TEST 


By S. RUSHTON 
Imperial College, London 


1. INTRODUCTION 


In a previous paper (Rushton, 1950) the author has discussed a one-sided sequential t-test 
and suggested the use of a certain approximation for carrying out the test in practice. The 
test discussed in that paper was one for deciding whether the mean y of a variate X , assumed 
normally distributed with unknown standard deviation co, is given by lo = 6 or whether 
hilo = 6’. Ifp = O(—34) and p’ = O( — 4d’), where ®(z) is the cumulative distribution function 
for the standard normal distribution, this is equivalent,to testing whether the probability 
Pr {X < 0} has a value p or whether it has a value p’. 

The purpose of the present paper is to discuss briefly, along the lines of Barnard (1952), 
the two-sided case where we are interested in testing whether the mean of X is zero or 
whether it is given by | | = do, that is to say, that the alternative hypothesis includes the 
possibilities that «7 = +do0 and « = —do, and to mention the approximation appropriate 
for this case. 

The general case in which we wish to test whether ~ = 1) against the alternative that 
/t = flgt+ do or ft = fg — Se is reducible to this one if we deduct /, from all the observations 
on X. Such a question would be appropriate, for example, if we were interested in a measure— 
ment X on a certain product, and we considered it desirable that the values of X should 
centre about /¢9, so that, if the production process is functioning correctly, half of the values 
of X should be less than jv, and half of them should be greater than jy. If, however, the 
distribution of values of X is so far from being centred on y, that a proportion p of the 
values are less than , and a proportion (1—) are greater than “, or a corresponding 
displacement occurs with a proportion (1—y) less than jz, and proportion p greater than 
/o, then in either case (or a worse) we wish to take suitable action, e.g. by adjusting the 
production process or by rejecting batches of product produced under these conditions. 
Choosing dé such that p = ®(—6), we shall then be interested in testing whether the mean 
of X is zero against the alternative that | | = da. 

A particular application of such a two-sided t-test occurs in connexion with certain 
industrial measuring devices for which the zero setting is liable to wander. In any such case 
the decision whether to re-set the instrument or not may well be taken on the basis of 
a sequential procedure, the successive observations being the readings obtained from 
repeated measurements on the ‘standard’. Ifan appreciable bias, expressed as a standardized 
departure from the known true reading and irrespective of sign, is detected, then the 
instrument will be taken out of service and re-set. 
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2. THE LIKELIHOOD RATIO 
Stated precisely, the hypotheses we wish to test are 


HA: X is normally distributed with mean ~ = 0 and unspecified standard deviation 
a(0<a<0o), 


against the alternative hypothesis 


#’': X is normally distributed with mean ~ = «éo and unspecified standard deviation 
a (0<a@<0), where 4 is a specified positive number and « is the unspecified sign 
of w(K = —lor +1). 

As in the case of the one-sided t-test, a € &, where = is the group of positive real numbers 
forming under multiplication a group of transformations of the real axis (the space of X) 
into itself. Again, xe K, where K is the finite group consisting of the two elements — 1 and 
+1. which forms a group under multiplication, and, further, can also be regarded under 
multiplication as a group of transformations of the space of X into itself. Also, if X is 
normally distributed with parameters é, x and o and we put Y = «’X, where x’e K, then 
Y is normally distributed with parameters 6, x’x and 7; whilst Y = o’X, where o’€%, is 
normally distributed with parameters 6, x and o’o; and, considering the general trans- 
formation Y = «’o’X, then Y is normally distributed with parameters 6, x’x and o’o. The 
hypotheses .# and .#’ are therefore invariant under transformations of this general type. 

Now, we may express the likelihood function of a set of n independent observations 
x; (t = |,...,”) on the variate X, as 


oe l : \ 
(/(27) a) exp {—sral(n— 58+ (kt 9— Yn xd], (1) 
where, in the usual way, we define 
n n 
nz.= > 2; (n—1)s?*= ¥ (x,-2,)? and t? = nz*/s?, (2) 
i=1 i=1 


and we denote the sign of x, by k. It follows that |¢|, & and s are jointly sufficient for 6, x 
and o. Also, under transformations Y = x’o’X, k and s become x’k and o’s, whilst | t| 
remains unchanged. Hence, the likelihood ratio for testing the hypothesis # against the 
alternative #’ is 





A = f(t? | 8)/f(? | 0), (3) 
where f(/? | 3) is the p.d.f. of / in the sample when 4 is true. 
In fact, H 
2(n — 1)k—-D , not? ) 
2 = p— nd See Ss 
fe] 8) B(4(n — 1), 4) (n— 14+ #7) ’ M(In, 4; 2(n— 1+ #?))” (4) 
A = en tn? a 
so that A= ¢ M(in,3, 3n—14+%)° (5) 


where M(a, y, x) is the confluent hypergeometric function. 
In practice, it is easier to work with u? where 


u? = nt?/(n—1+#?) 


in 2 n 
E2)) /( z), (6) 
i=1 i=1 


so that A = e-'"* M(4n, }, 46u?). (7 
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The justification for using this ratio as the basis of a sequential procedure follows the 
same lines as for the one-sided test, except that we are now concerned with two nuisance 
parameters, « and o. Unique prior distributions for these parameters are defined by the 
corresponding invariant Haar measures of the groups K and &, and the pair of composite 
hypotheses # and .#’ are thereby reduced to a pair of simple hypotheses. The invariant 
Haar measure of a finite group containing exactly m elements is just the number 1/m, so 
that the number 1/2 is the appropriate Haar measure for K, whilst for & we have again 

°odx 
Ha, b) aa |. =z ° 

Following Barnard (1952), the likelihood ratio (5) is then obtained as the simple average 
of two likelihood ratios appropriate for one-sided t-tests, one for testing that ~ = 0 against 
“= do and the other for testing « = — do. 

For specified risks of error a and £, the sequential procedure will therefore consist in taking 
additional observations 2; as long as 


B)(l-—a) <A<(1—£)/a. (8) 


The hypothesis # that the mean of X is zero will be accepted as soon as u? <u?, where 
uj is the solution of the equation 


M(}n, 4, $0?uj) = eb B/(1—a), (9) 


and the hypothesis .#’ will be accepted as soon as u>u3, where w3 is the solution of the 


equation M(4n, }, }62u3) = etnd* (] —f)|a. (10) 


That these statements are true follows from the fact that M(a, y, x) is a strictly increasing 
function of x. 


The test proposed by Wald (1947, §A9-2) and based on his use of ‘weight functions’ 
gives the likelihood ratio A = e-in* M(S(n— 1), 4, $62u2) (11) 
instead of (7). 

As a matter of historical fact, it appears that H. Goldberg was the first to make specific 
reference to the properties of likelihood ratios using the ratios of probability density 
functions of non-central t-distributions. In a footnote in Eisenhart et al. (1947, p. 83), there 
occurs the statement: ‘An exact treatment of proportion defective problems by sequential 
analysis of variables involves the ratio of densities of two non-central ¢ distributions. 
Henry Goldberg has shown that this may be expressed in terms of a confluent hyper- 
geometric function, ...’*. Extensive tables of M(a,4,x) have been published by National 
Bureau of Standards (1949) to facilitate the exact evaluation of the ratio (7), and in a sub- 
sequent table (National Bureau of Standards, 1951) the boundary values u? and u3, solutions 
of equations (9) and (10), have been given for a wide range of values of «, 2,6 and n. These 
latter tables, in which z is used for our u? and which are based on the use of the likelihood 
ratio (7) and not on Wald’s ratio (11)—although they can be adapted for use with the latter 
ratio—are a complete solution to the problem of the exact determination of the acceptance 
and rejection boundaries for a sequential procedure for the two-sided t-test. We shall, 


however, mention a practical procedure for carrying out the test which makes use of an 
approximation to the likelihood ratio (7). 


* This remark was brought to my notice by the Editor subsequent to the publication of my previous 
paper (Rushton, 1950). 
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3. ANALYSIS OF VARIANCE: THE CASE OF TWO GROUPS 


An analysis of variance based on a one-way classification with only two groups is of some 
interest here, first because the test of the difference between the means of the two groups 
involves a two-sided t-test, and also because an additional nuisance parameter has to be 
taken into account. We shall consider that we have an equal number of observations 
%;;(¢ = 1,2; 7 = 1,...,m) on two variates X, and X,. The case of unequal numbers in the two 
groups follows in the same way. Defining 


v= X;;[n (¢ = 1, 2) (12) 
j=1 
and x = (x, +2, )/2, (13) 


the ‘ Between Groups Mean Square’ 


=n > (z;,—2_)? 
i=1 


= n(x, —22 )?/2, (14) 
and the ‘ Within Groups Mean Square’ is given by 
8 = YD (X%j5—%;.)?/(2n—2). (15) 
tj 


We shall consider a test of the hypothesis 


HA: X, and X, are independently normally distributed with unspecified standard 
deviation ¢ (0<a@< 0) and equal means 1, = /tg = 0, where 0 (—00<@<0) is 
unspecified, 


against the alternative hypothesis 


MH"; X, and X, are independently normally distributed with unspecified standard 
deviation o (0<a@<0o) and means “4, = 6+Kdo and pp, = O—Kdo respectively, 
where @ (— 00 <9 <0) and « (k = — 1 or 1) are also unspecified, and ¢ is a specified 
positive number. 


In these hypotheses, the additional nuisance parameter / belongs to the set M of real 
numbers which forms a group under addition and can be regarded as a group of trans- 
formations on the space of X, and X,. The invariant Haar measure of the group M is clearly 


a 
p(c,d) = i dz = w(c+6',d+0’). 


The hypotheses .# and .#’ are invariant under the general transformations 
Y¥,=0'+xo'X; (i = 1,2). 
Further, if we define t? = n(x, —2,_ )?/( 28?) (16) 


and denote the sign of (x, —2,,) by k, then it is evident that k,s,x, and |t| are jointly 
sufficient for x,a,0 and ¢. The appropriate likelihood ratio is then 


A =f? | d)/f(F | 0), 
where f(#?|) is the p.d.f. of ¢? in the sample if ¢ is true. This reduces to 


yp (2M—1 T (2n-1) 8 
— e-n¢?? aM 
Az=e * M( ; Fie aes 





(17) 
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In practice we shall use 


u? = (2n— 1) t?/((2n—2) +2?) = n(x, — Xp,)?/2 


Ld (x5 —2,,)?/(2n- 1)’ 
and 6 = /2¢ = | my, —/2|/ V2. (19) 


In terms of 6, the hypotheses being tested are then that Pr{X,< X,} = § against the 
alternative that this probability is ®( — 6) or ®( + 6), regardless of the actual location of the 
distributions of X, and X,. 





(18) 


4. APPROXIMATION FOR LOG A 
Tt can be shown that, to a first approximation, 
M(a,4,x)=Fexp[}x+2 y(ax)], (20) 
and, to a second approximation, 
M(a,4,x) =e! cosh 2 (ax). (21) 
This can be shown by the same method as that used in Rushton (1950): 
v(x) = at e-* M(a, }, 2) 
satisfies the differential equation 
v" +(—44+(f-«a)/a+3/(162x?))v = 0; (22) 
assuming a solution of this equation in the form 
v(x) = y(x). exp [hO(x)] {1 + w,(x)/h + we(x)/h? + ...}, 
and applying the boundary conditions M(a,4,0) = 1 and M’(«,4,0) = 2a, the approxi- 


mations (20) and (21) are obtained. 
Substituting for M(4n,4, $62u?) in the expression (7) and taking logarithms, the first 
and second approximations /,,/, for the logarithm of the likelihood ratio are obtained as 


L, = }62u? + V(nd2u?) — (4nd? + log, 2), (23) 

and 1, = 1, + log {1 + e-2Vinsy, (24) 
Since log (1 +2) ~ a, for x small, we have approximately 

by = Ly + e-2V (ndtuts, (25) 


The practical procedure for using these approximations to logA involves recording 
+ ./(nd?u?), — (nd? + log, 2) and }d?u? and comparing /,, or /, if necessary, with log £/(1—«) 
and log(1—/)/a. It will usually be the case that it is not actually necessary to compute 
l, before the final stage, since inspection of the values of + /(nd*u?), — (4nd? + log, 2) and 
}d?u? will verify that /, is not approaching a critical value. Since e~?¥("*“ < 5-10-3 when 
\ (né?u*) > 2-7, it follows that if we work to two decimal places it will rarely be necessary to 
obtain the correction (/,—/,) to be added to /, to give the seeond approximation /,. 


5. THE PRACTICAL USE OF THE APPROXIMATION 


The practical procedure involved in using the above approximations is shown in Table 1 for 
a sequential test in which 6 = 1 and x = # = 34. In the case of the particular sample of 
Table 1, taken from an example of National Bureau of Standards (1951), the observations 
x, actually came from a population with zero mean. The table includes columns of exact 
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boundary values u? and w3, and it is seen that a definite decision in favour of the hypothesis 
H# that the mean is zero is reached at the 16th stage when wu? falls below u2. No decision 
is possible in favour of # for n < 6 or in favour of .#”’ (that the mean y is given by | z/o| = 1) 
for n < 5, because of the algebraic condition on u? that u? <n. Successive columns of Table 1 
show values of + /(nd?u*), —(4nd*+log,2) and }62u*, which are necessary for computing 
the approximations |, and /, to log A. If exact boundary values are not used, the procedure 
consists in comparing at each stage logA with log #/(1—«a) and log(1—)/a, in this case 
+ 2-94. Working to two decimal places, the approximation /, is obtained as shown and the 
hypothesis # is accepted at the correct stage. It will be noticed that at no stage before the 
16th is it actually necessary to obtain /,, since by inspection it is seen that 


{ /(nd®u2) +432u? — (Jnd?+ log, 2)} 


nowhere approaches the critical limits + 2-94 very closely before a decision is actually 
reached. It will in fact usually be the case that inspection of the values of + /(ndu?), 
— ($né? + log, 2) and 46?u? will verify that /, is not approaching a critical value without 
it being necessary to do the arithmetic accurately. Also, in this example, it is nowhere 
necessary to calculate the correction (/,—/,) to be added to /, to obtain the second approxi- 
mation /,. Since e—?4é*u < 5-10-3 when ,(ndé?u?)>2-7, this correction is everywhere 
negligible except when it is clearly not going to affect the decision taken; at the 2nd and 
5th stages the corrections are 0-02 and 0-01 respectively, but log A is in both cases very far 
from the critical values. 


Table 1 
— (4ns* 

n x; Za, Za? u? ur ug + (nd*u?) +log,2) 46%? 1, l,—1, 
l 414 — 1681 — _ —_ — —_ — — — 
2 76 «117 7457 «1:336 — —_— 1-92 — 1-69 0-46 0-69 0-02 
3 65 182 11682 2335 — _— 2-92 —2-19 0-71 144 — 
4 162 344 37996 3-120 — — 3°53 — 2-69 0-78 162 — 
5 —I111 233 50247 1:080 — 4-978 2-32 — 3°19 0-27 -—060 0-01 
6 29 262 51088 1-344 0-019 4-992 2-84 — 3°69 0:34 -052 — 
7 — 2 260 51092 1-323 0-185 5-053 3-04 — 4-19 0-33 -082 — 
8 30 290 51992 1-618 0-350 5-145 3°60 — 4-69 040 -069 — 
9 — 43 247 53841 1-133 0-519 5-258 3°19 — 5-19 0:28 —1-72 — 

10 18 265 54165 1-297 0-692 5-387 3°60 — 5-69 0-32 -—1-77 a 

11 243 508 113214 2-279 0-870 5-528 5-01 — 6-19 0-57 -062 -— 

12 — 89 419 121135 1-449 1-051 5-678 3°55 — 6-69 0:36 -—-2:78 — 

13 —27 392 121864 1-261 1-234 5-835 4-05 -—7-19 0-31 -—2:83 —~ 

14 139 531 141185 1-997 1-421 5-997 5°29 — 7-69 0-50 —1-91 — 

15 — 41 490 142866 1-681 1-609 6-164 5-02 — 8-19 042 -2:75 — 

16 —20 470 143266 1-542 1-798 6-335 4-97 — 8-69 0-39 -—-3:34 — 


In experiments with different schemes, carried out to test the use of this approximation, 
the above procedure using the approximations to logA resulted in all cases in the same 
decisions at the same stage as were reached using exact boundary values. In any sequential 
procedure, it has to be remembered that the ‘excess over the boundaries’ when a decision 
is reached means that an approximation process does not need to be exceptionally accurate 
in order to be entirely satisfactory in any practical case. The present approximations and 
that for the one-sided t-test have, indeed. been found to be satisfactory, even for small n. 

If we consider the effect of taking the standard deviation s estimated from the observa- 
tions as being equal to the unknown a, we can see that the approximations /, and /, converge 
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as we would expect for large n, since the effect of ‘studentization’ diminishes rapidly for 
increasing n. The likelihood ratio is that appropriate to the case where o is known with 
8 replacing c, so that 

A = {exp[—4(2X(x; — ds)*)/s*] + exp [ — }(X(x, + 48)?)/s?]}/{2 exp [ — 4(2a7)/s*}} 


= fe-ins*(esdai/s 4 ¢-sEzi/¢) 


= fe-i* fexp[/(nd?u?) + 4du3//n]+exp[— /(nd2u?) — 4du3//n]}. (26) 
It follows that to a first approximation 
log A = 46u3/./n + J/(nd?u?) — (4nd? + log, 2), (27) 


which differs from (23) in having the term }du3/,/n instead of }62u*. For n sufficiently large, 
these terms are negligible compared with the other terms in (27), so that for large n the 
approximation (23) converges to (27). 


6. THE SAMPLE SIZE 


As with the one-sided sequential t-test. a lower bound to the mean sample size in the 
present case of testing the hypothesis that the mean y is zero against the hypothesis that 
| «| = do can be obtained by applying the formulae of Wald (1947) for the mean sample 
size when X is normally distributed with unit variance, and we are testing the hypothesis 
that ~ = 0 against the hypothesis that |u| = é. 
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TESTS OF FIT IN TIME SERIES 


By P. WHITTLE 
University Institute of Statistics, Upsala 


1. The Neyman-Pearson test theory has as its aim the optimum discrimination between 
two sets of hypotheses, the null and counter-hypotheses. In some presentations of the 
theory these two hypotheses are assumed mutually exclusive, a supposition which is 
essential to the concept of discrimination. However, an important class of tests is that for 
which the counter-hypothesis includes the null hypothesis, so that a model belonging to the 
set of null hypotheses belongs also to the set of counter-hypotheses. These are the tests 
of fit. 

The variance-ratio tests of an analysis of variance are of this type: the likelihood that the 
means of the observations have a certain structure is compared with the likelihood that they 
have a somewhat more general structure (e.g. the hypothesis that all treatment constants 
are zero is compared with that which permits them to vary). Thus, the best fit achievable 
on the null hypothesis is compared with the best fit obtainable in the wider class of 
alternatives permitted by the counter-hypothesis. 

Tests of fit have a number of advantages which are seldom shared by the discriminatory 
type of test. First, the test statistics usually have simple and parameter-free distribution 
functions—a benefit upon which we need hardly enlarge. One consequence is that the 
evaluation of confidence regions becomes a very simple matter. Again, if none of the null 
hypotheses taken up for consideration is suitable, an appropriately chosen test of fit will 
indicate this, while a pairwise comparison of hypotheses may not do so. Lastly, dis- 
criminatory tests suffer from the disadvantage that the choice of the counter-hypothesis is 
often a very arbitrary affair. Of course, the same is true in lesser degree for a test of fit, but 
the appropriate counter-hypothesis is generally fairly obvious in any particular case. 

In this paper we shall deduce certain tests useful in time-series analysis, using least- 
square methods to construct our statistics. That is, if 0, is the minimum value of the 


residual sum of squares permitted by the null hypothesis H,, and U, that permitted by the 
more general counter-hypothesis H,, then the test statistic will be the ratio 


A= 0,/0,. (1-1) 


The statistic A measures the fit of the hypothesis H, when we restrict our attentions to the 
alternatives permitted by H,. Its value will approach unity or zero depending upon whether 
the fit is'good or bad. When the residuals are normally distributed, A is of course equivalent 
to the ratio of maximum likelihoods. 


2. Let x, be a stationary, purely non-deterministic process, the reciprocal of whose 
spectrum may be expanded in a Fourier series. The process may then be expressed 


Ly + Ay X_y + Ag% ot ...= €1, (2-1) 


i.e. as an autoregression, generally infinite. Equation (2-1) forms a convenient starting- 
point for the application of least-squares methods of estimation and testing. 
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Suppose now that we have a time series generated by the process 2,22, ...,2,,. Since the 


process contains no deterministic components, E(x) = 0, and we may define the empirical 
autocovariance of lag s as 


D 2p. 9s (2-2) 
1 


flv) = ES Cem. (2-3) 
Let the corresponding theoretical quantities be ¢(s) and F(w), so that 

Flo) = ¥ gls)e (2-4) 
Now, by a result of Kolmogoroff’s (1941), the variance of the residual variate ¢ is given by 


v = exp lel, log F(w) dw | ‘ (2-5) 


We shall define the normalized spectral function G(w) = F(w)/v, which obviously obeys 
the equality an 
| log G(w) dw = 0. (2-6) 


We shall regard it as our aim to estimate the function G(w). 
Now, to our n sample variates correspond n residual variates €,, €9, ..., €,,- The author has 


n 
shown (Whittle, 1951) that the residual sum of squares, U = > é?, is, apart from an end- 
1 


correction, given by U =n>4,U,, (2-7) 


where , is the coefficient of ¢* in the Fourier expansion of [G@(w)]-'. However, while (2-7) 

is the preferable form for numerical calculation, for theoretical work we shall find it more 

convenient to express U/ in terms of the periodogram 

2n 
y-2("f, 
27 Jo G(w) 

Suppose now that G(w) has ieast-squares estimate G(w). The expression in (2-8) must be 

minimized subject to (2-6), which holds identically for all modifications in G (such as 
parameter variation), so that the estimation equations are 


2n A 
an fiw) dw = minimum = U, 
27J0 G(w) 


Ow. (2-8) 





“@ Je (2-9) 
log G(w) dw = 0 
0 


(see Whittle, 1952). U is here the minimized residual sum of squares, which we shall take as 
the touchstone of a hypothesis’ agreement with observation, just as in classical least-squares 
theory. Thus, suppose that hypothesis H, specifies a certain type of function for G(w), but 
with » +q undetermined parameters, while the more restricted hypothesis H, permits only 
p of these parameters to vary. If the respective values of U yielded by equation (2-9) are 


denoted by 0, +q and 0, then the ratio 


A= 0,18, (2-10) 








1e 
al 


4) 


3) 


4) 
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measures the fit of H, relative to the alternatives permitted by H,. It will be further shown 
in the appendix that, if the residual variates are independently distributed, then the 

statistic se Sealy 
( t= (n—p—q) P= PH = (n—p—g)| ~~ (2-11) 

0 A 

p+a 
is under hypothesis H, asymptotically distributed as x? with q degrees of freedom. This 
result permits the testing of the variance ratio A. Thus, a significant value of y* indicates 

bad fit, while a value in the neighbourhood of expectation indicates good fit. 

For concreteness we shall now examine a few special cases. Consider the autoregression 


YA Ay Xp +... +AyXY_py = & (2-12) 
for which G(w) = (1+a, e+... +a, e) (l+a,e-+ ... +a, e-), (2-13) 


so that the sum in (2-7) is finite. Minimizing U with respect to a,, a9, ...,@, (which are all 
supposed unknown) we find that the minimized sum of squares is 


= nA,/A,-» (2:14) 
where so | Se 
ie Siac 
apres tint? (215) 
Ch Ch-1 eee Cy 


Thus, if we wished to test the advisability of increasing the order to » + | we would calculate 
the variance ratio 


A= 2! | (2-16) 


a 
: 7 “) should then be distributed as x? with one 





If hypothesis (2-12) were true, (n—p-— 1) ( 
degree of freedom. 

The sum in (2-7) will not be finite for other hypotheses than the autoregressive, but the 
coefficients ~, are usually quickly convergent, so that it is sufficient to consider a limited 
number of terms, the exact number being clear in any particular case. 

Suppose that we wished to compare the hypotheses 








x, = €, + be,_, (2-17) 

and L,+A%,_, = €,+ be,_). (2-18) 
We find that the corresponding values of U are 

" 1 +2 

U = min ; = = vic, (2-19) 

| LHIIC, THis +11, | 
A |s+1\¢! |s 
6 = min | ee we wt ’ (2-20) 


3. In time-series testing. one natural set of alternative hypotheses is the class of all 
stationary, purely non-deterministic processes. In order to be able to apply the theory of 
the previous section we shall also require that the residual variates are independently 
distributed, and that the spectral function is such that its reciprocal is expandable in 
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a Fourier series (for all practical purposes, that the spectral intensity be nowhere zero, as it 
is, for example, for the process x, = €, — €,_,). These restrictions will seldom be found irksome 
in practice. Now, sucha process may be represented as in (2-1), and to calculate the minimum 
residual variance we should require to estimate, by least squares, all of the coefficients 
@;,4,,... in the representation. It is, of course, not possible to estimate an infinite number 
of parameters from a finite sample, but the series of a coefficients must converge, and by 
considering sufficiently many coefficients we should be able to obtain an arbitrarily good 
approximation to the real process. In other words, if we graduate the process with an 
autoregressive scheme of sufficiently high order, we should be able to obtain an estimate of 
the residual variance whose bias is arbitrarily small. Such a graduation implies a gross 
overfitting, but this can be allowed for. There is, of course, nothing special with the auto- 
regressive scheme; we could equally well graduate with a high-order moving average, and 
there are many other possibilities. The point is, that if these graduations are of sufficiently 
high order they should all lead to the same estimate of the residual variance. In practice, 
however, the autoregressive graduation has the advantage that the estimated residual sum 
of squares can be written down directly in terms of the observations (see (2-14)) without the 
need to solve explicitly for the estimates of the a coefficients. 

Theoretically, then, our counter-hypothesis is ‘all autoregressive schemes of order k or 
less’, where k is the order of graduation. However, for sufficiently high k this is equivalent 
to ‘all purely non-deterministic stationary processes’, with the above limitations. 

Suppose, now, that any particular hypothesis, entailing p 0-parameters, leads to a min- 
imized residual sum of squares O,,, while the corresponding quantity from the graduation 
will be denoted simply as U. We have then that 

yp? = (n—k)(U,—U)/U (3-1) 
is distributed as y? with k — p degrees of freedom, for sufficiently large k and n. A significantly 
high value of yf? indicates bad fit, while a value in the neighbourhood of expectation indicates 
satisfactory fit, at least if we restrict our attention to schemes of the type considered. 

The idea of fitting an autoregression, for example, of successively increasing order until 
significance appears to be exhausted, is no new one. Yule (1927) used this technique regularly, 
and further surmised that the partial autocorrelation coefficient had roughly the same 
distribution as a partial correlation coefficient, the equivalent of (2-11) in this particular 
case, However, an autoregression (finite) was always tested against an autoregression. In 
our case, we test any hypothesis of type (2-1) against the autoregressive graduation, which 
is supposed to be of so high order that it includes all likely hypotheses. However, it is again 
emphasized that the fact that the graduation is an autoregressive one is quite inessential 
to the argument. 

An awkward point is, of course, the determination of k, the order of graduation. If k is 
‘too small, then the graduation is inadequate, and cannot be said to include all likely 
hypotheses. On the other hand, the greater k becomes with respect to n, the greater is the 
deviation of the true y/? distribution from the x? form. Ideally, k should be chosen somewhat 
beyond the point where the addition of extra terms to the graduation does not seem to 
decrease significantly the residual variance, if only n is sufficiently large to permit this. 
We can never be certain, of course, that if we had continued the graduation yet a step we 


would not have found a significant change, but this is a difficulty essential to all inductive 
reasoning. 
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Quenouille (1947) has proposed a test for the autoregressive scheme which also leads to 
a test statistic distributed as y?, so it may be appropriate to compare the two. even if we 
thereby risk giving the false impression that the autoregressive scheme is our particular 
preoccupation ! f 

The least-squares statistic for discriminating between autoregressive schemes of orders 
Pp, p+ q is (see (2-16)) 


(An+qAp—1)/(Ap+q-1 Ap) a (1 8 rE o+1.28...p) (1 big Ty p+2.23...p+1) 229 (1 oo er (3-2) 


where 7; 93. ;-1 18 the partial correlation coefficient of x,, x,,;. the linear effects of x, 1...4)45-1 
being removed. Quenouille’s statistic is, in effect. 


2 2 2 “ 
TY, p+1.23...p +11, p+2.23 ...p + +++ $171, p+q.23...p° (3-3) 


(3-2) and (3-3) are obviously not identical, Quenouille compounding likelihood ratios by 
addition rather than multiplication, although this is not the only difference. Neither will 
the two tests agree if the counter-hypothesis is any other than an autoregression, for then 
the least-squares statistic will, theoretically at least, involve autocovariances of all orders. 


4. An artificial experiment was carried out, using 150 terms generated by the moving 
average model 2, = €,+€_,— }€,_». where the residuals were normally distributed about 
zero. The sample values of the first twenty-one autocorrelations were as follows: 


Lag Coeff. Lag Coeff. Lag Coeff. Lag Coeff. 
l 0-249 7 00731 13. —0-0125 19 0-0342 
2 —01806 8 —0-0644 14 —0-1109 20 —0-0160 
3  —0-0148 9 —0-1257 15 —0-1176 21 —0-0360 
4 —0-0351 10 —0-0591 16 0-0842 
5 —0-0367 11 0-0513 17 0-0135 
6 0-0552 12 -0091 18 —0-0483 


These autocorrelations may be used in place of the autocovariances, since a scale factor is 
immaterial. ” 

The graduation was an autoregressive one of order 10, yielding U = 0-78087n. The two 
hypotheses tested were: autoregression and moving average of order two, which gave 
CO = 0844300, 0-78764n respectively. The corresponding y? values are 11-550. 1-274, which 
have probabilities 0-190, 0-994 of being exceeded (x? with 8 p.F.). It is overwhelmingly 
obvious that the moving average should be accepted before the autoregression, surprisingly 
so, in view of the indecisiveness which has seemed to be a characteristic of time-series tests. 

That the moving average scheme gives such an improbably good fit is an almost certain 
indication that the graduation is insufficient, ten terms not really sufficing to represent the 
process. However, there is no doubt about the result. 


5. To illustrate the practical application of the above methods we shall briefly describe 
the testing of certain models fitted to the well-known Beveridge index series (300 terms, 
trend removed). The fitting of an autoregression of successively increasing order yielded 
the following values of ¢ = UJn: 


Order d ye? =(n—p) (®,-—V>_,)/*, Order d ys? = (n—p) (8, -— V,-1)/8, 
1 0-684156 161-118 6 0-604977 5-764 
2 0-617939 37-290 7 0°596762 4-722 
3 0-617009 0-523 & 0585251 6-727 
4 0-°616181 0-346 9 0-°585248 0-002 
5 0-615114 0-599 


Biometrika 39 21 
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An application of test (2-11) shows that lags !, 2, 6, 7 and 8 are the only ones which seem to 
coriribute anything of value. However, the significance of lags 6 and 7 may be only 
apparent, a ‘contagion’ from lag 8. To test this, we compare the v values obtained by fitting 
autoregressions of lags (1, 2, 7), (1, 2, 6, 7), (1, 2,7, 8) and (1, 2, 8) with the graduation value, 
® = 0-585248. That is, we apply the test of fit. The results, summarized in the following 
table, show that lag 6 is in fact of no significance, and lag 7 of rather doubtful significance: 


Degrees of 
Model e yy freedom 
AR 127 0-599010 7-992 6 0-20 
AR 1267 0°597876 7-335 5 0-22 
AR 1278 0-590827 3-240 5 0-60 
AR 128 0-594732 5-490 6 0-48 


A similar test of the fit of moving average schemes of first and second orders leads to the 
following results: 


Degrees of 
Model 3) ye? freedom "a 
MA 1 0-6457 35-088 8 <0-0001 
MA 12 0-6252 23-117 7 0-0003 


so it is fairly obvious that we shall not find a better fit among the moving averages. The 
conclusion is, then, that of those models fitted, that giving best fit is an autoregression with 
lags 1, 2 and 8. It may further be said that the fit is a good one. 

It may be argued, and rightly, that the significance of lag 8 is not so great as would appear, 
since we have chosen the most significant of lags 3-9. However, the effect is too strong to 
be explained away in this fashion. Testing the fit of an autoregression (1,2) against an 
autoregression (1, 2,8) we obtain 

yy? = 300(0-617939 — 0-594732)/0-594732 ~ 11-66. 


The probability that a x? variate should have a value < 11-66 is approximately 0-999, so 
the probability that the greatest of seven variates should have a value less than 11-66 is then 
(0-999)? x 0-993, so that the probability that the lag 8 coefficient arose purely by chance 
is only 0-007, even on this sterner criterion. 

Of course, a graduation of order 9 is definitely on the short side, and it is not impossible 
that a longer graduation would reveal new features. Indeed. 2 comparison of the observed 
correlogram with that calculated from the fitted scheme shows a sudden discrepancy at 
1g, 80 that it would seem that lag 16 is of importance. That lags of order 8 and 16 should 
be present is more than a coincidence, and it seems probable that a scheme 


H+ Ay X_1 +Ag%_9 = & + bE,_, 


would lead to better fit. However, we shall not go further into the matter here, except to 
note that the spectral function corresponding to the fitted (1,2,8) autoregression has 
strong maximaat the w values corresponding to periods of 6-20 and 14-95 years, in agreement 
with the observed periodogram. 

Similar calculations to the above were performed on the series of terms given by the values 
of the index from 1770 to 1869 (Wold, 1938). These values were unadjusted, since the series 
is for these years almost trend-free. The results tallied in detail with those of the above 
analysis (although it should be mentioned that the residual variances were uniformly 
smaller, an indication of greater coherence in the series), and this may be taken as indicating 
that the methods may be used with confidence on series of 100 terms, at least. 
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6. It is of interest to see what form the test statistic (3-1) takes as k, the order of gradua- 
tion, becomes comparable with n. Considering the autoregressive graduation of order k, we 
know (Whittle, 1951) that, for large enough k, 


ome)” 


If we restrict ourselves to the case in which the null hypothesis is fully specified, so that the 
estimated residual sum of squares is given by equation (2-8), then the limiting value of the 


variance ratio is mj\ 
o Lt) i] 
= lim (6-2) 


- Te) 4, 
on G(w Go)? 


Now f(w) x E(w) f.(o), (6-3) 


where f,() is the periodogram of the residual series €,, €5, ...,€,,. Thus 


oth ell sth 


~Se 





by equations (2-6) and (2-8). Now, consider the periodogram ordinates 


27) ; n—1 
Y; =.=) E = OM eo? 2 | 


where we have supposed that 3(m — 1) is an integer, equal to m, say. If we assume that the 
€ variates are normally distributed, then it is known that the frequency function of 
Ys Yo: -++>Ym is e™e-*~"j, where c is a constant. Further, since y; = y,_;, we can with good 
approximation replace the geometric mean in (6-4) by the geometric mean of ¥, Yo, ---, Y»- 


so that fm Lm 
(I ¥,) 





= Aum (6-5) 


say. We see from (6-5) that the limiting form of the statistic has the simple interpretation of 
the ratio of the geometric and arithmetic means of the residual variates’ periodogram. 

Let us now consider the distribution of this limiting form. We shall omit the algebra, 
but it is a fairly direct matter to show from the exponential frequency function of y,, Yo, .--. Yn, 
that log (Aj) has characteristic function 


af T'(m) Ph. ean 
= a ” Fon +0 +10) oe (6-6) 


The cumulants of log (Aj,m) may be calculated from this expression with the help of Gauss’s 
formula for the logarithmic derivate of the function: 


*o [pt y—al 
eel tog M(z) = |, E -—4| dt. (6-7) 


21-2 
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Applying this we find that 


ee | 1 
Eflog (Aym)] = log (m)—[145+5+--+ 4], (6-8) 
1 1 1 1 1 
var log Ami) == [1+5+3:+--|-[petaeat | (6-9) 
and that for the jth cumulant in general (j = 2, 3, 4, ...) 
Ksflog Aim)} = (= G=1!| mF -/— & (mg vy]. (6-10) 
v= v=(0 


We see from (6-8) that the asymptotic value of — E[log (Aj;)] as x (and thus m) increases is 
the Euler-Mascheroni constant, y = 0-5772.... That is, the asymptotic value of Z(Ay,,) is 
e-Y = 0-5615.... We can interpret this by saying that the asymptotic effect of the above 
graduation procedure, when applied to a finite series, is to produce a residual whose variance 
is only e~’ of the true residual variance. The occurrence of the transcendental e~7 in this 
connexion is perhaps unexpected. 

It would not be at all impractical to calculate the statistic Aj, since the equidistant 
periodogram ordinates are often calculated as part of a routine analysis, and for large 
m it would certainly suffice to calculate every second or third ordinate. However, we see 
from equation (6-9) that Ay, has variance O(n-'), while it is obvious from equation (3-1) 
that the corresponding A there has variance O(n-?), in precisely the same way that the 
ordinary periodogram ordinate has variance O(1), while that derived from the truncated 
correlogram has variance O(n~'), and for precisely the same reason. Thus although Ay 
appeals as being rather less arbitrary in its construction, the A based upon a limited 
graduation has the advantage of a variance of lower order. Indeed, it is fortunate that A’s 
distribution is so concentrated, since the 0 values for quite distinct hypotheses usually 
differ surprisingly little, despite their optimum nature, and it can be correspondingly 
difficult to discriminate between them. 


7. This concluding note will serve to deal with a few miscellaneous points. First, it is 
natural that in many cases the class of hypotheses (2-1) should prove too restricted. The 
most immediate generalization would be to include deterministic components, and it may 
be shown that the methods of this paper may to a large degree be extended to cover this 
case. That deterministic component which is most commonly of interest is a constant mean. 
The methods of the appendix may be used to show that the least-squares estimate of the 
mean, 2, is asymptotically 12 
2=-Sa, (7-1) 

ny 


whatever the form of G(w), and that the fitting of 2 has the same effect on the residual sum 
of squares as the fitting of any of G’s parameters. 

In our examples, we have used the test of fit to decide between different hypotheses, 
simply accepting that hypothesis which gave best fit. The justification for this must be, 
that if we lack all a priori information, and thus have no reason to favour any of the 
hypotheses considered, then our only basis of judgement is the statistical fit of each 
hypothesis considered for itself (although we have seen that we cannot escape defining the 
general class of hypotheses within which we work). No discriminatory test can influence 
the order of preference thus obtained, since if hypothesis H, gives better fit than H,, then 


Aa 
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the observed value of the likelihood (or variance) ratio will be relatively probable on 
hypothesis H, and relatively improbable on H,. Thus, if we consider r null hypotheses, then 
the r(r — 1) discriminatory tests will give an answer which is in fact less informative and less 
easily interpreted than that given by the r tests of fit. However, the discriminatory tests 
have their place. A series of discriminatory tests with constant null hypothesis H gives the 
probabilities that the relative order of the hypotheses (when these are arranged in order 
of observed fit) should in truth be other than it is. It is just this which we should require 
to know if we had in advance some special reason for favouring H, i.e. if we possessed 
a priori information. The discriminatory test then tells us if the actual grading of the 
hypothesis H can reasonably be reconciled with the fact that it is, after all, the true one. 
However, as emphasized, it is only when we have such advance information that we have 
reason to fasten upon a hypothesis in this fashion. 


APPENDIX 


The true residual sum of squares is given by the formula 





*" f(w) 
= — 1 
al, on (1) 
Suppose that G(w) involves unknown parameters 0,,4,, ...,9,, so that 0, the minimized 


residual sum of squares, is obtained by minimizing (1) with respect to these parameters. 
From the minimization iti we readily find that 

= U-4U,Ui1U, + O(n), (2) 
where U, is the p x 1 vector of sine derivatives of U with respect to 0,,6,...,0,, and Uj, 
the matrix of the corresponding second derivatives. The moments of U and its hetiicdiiin 
may be calculated with the help of the author’s result (see Whittle, 1951) that the jth 
cumulant of a linear function of the autocovariances of a Gaussian series 


(° 22 
g= it fv) Q(w) dw [Q(-w) = Q(w)], (3) 


is asymptotically given by 
= ! 
k;= e —h™ “i. [F(w) Q(w)} dw. (4) 


Thus, remembering that F(w) = vG(w) and that {- log G(w) dw = 0, we find that 


0 


nev en , “ 
B59) = > | ap ( ;) dw = - 50,3  flogdde = 0 (5) 
(OU eU\ _ 2nv* (elogG clogG 
ick (ao 55.) ~ on J 00, OO, 1 (6) 
oU nv o Be _ nv [dlogG clogG 
B(:5,29,) = 2 " ay) 9a, 00). () a - =| 00, : Soe 7) 


With a comparison of (6) and (7) it is obvious that H(U,U}) = 2vE(U,,), so that 
(U-O,)[v = o'U OGY (1 + O(n) 
= 4071U{(BU,,) 1 U,(1 + O(n-)) 
= U,[E(U,U;)}* U1 + O(n-4)). (8) 
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Now, by Bernstein’s extension of the central limit theorem (Bernstein, 1927), the U, vector 
has a distribution which is asymptotically normal. Thus remembering (5) and (8) we see 
that (U — U,)/v is asymptotically distributed as x? with p ae of freedom. 

Similarly, if another q parameters are estimated, then (U — U,,,q)/v is asymptotically 
distributed as x? with p+q degrees of freedom. Further, since the variates 3 contributing 
to U- 0, are also found in U — she the partition theorem states that a -U. p+q)/¥ is 
asymptotically distributed as x? with q degrees of freedom. 

We can now use either 0, or 0, 4q to estimate v, since the null hypothesis is that the 
q extra parameters are redundant. If we use 0, 4q then it is obvious from the above that 
E(U, 40) = (n—p—gq)v, so that 

p? = (n—p—9) (O, — Tyg) Orsg (9) 
isasymptotically distributed as x? with q degrees of freedom. Here we have not ‘Studentized’, 
since the simultaneous distribution of i < oe, +q and U p+qi8 not a simple one. The approxi- 
mation is of the same order as the preceding ones, however. 

It is evident that the scale factor of the series is immaterial, i.e. y? may be regarded as 
a function of the autocorrelation coefficients. Thus, by a result of Bartlett (1946), its dis- 
tribution is asymptotically independent of the distribution of the residual variate e, if only 
the e’s are independently distributed. Hence our assumption that the process was a Gaussian 


one may be dispensed with, and the distribution calculated holds generally, although to 
a lower degree of approximation. 


SUMMARY 


A statistic for testing the fit of a general class of time-series models is proposed, which is 
asymptotically distributed as y?. The limit form of the statistic is shown to be the ratio of 
the geometric and arithmetic means of the residual variates’ periodogram, whose cumulants 
are calculated. The test is applied to artificial and observed material. 


Finally, the author would wish to express his indebtedness to the referee for a number of 
suggestions contributing sensibly to the clarity of the paper. 
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TENSOR NOTATION AND THE SAMPLING CUMULANTS 
OF k-STATISTICS* 


By E. L. KAPLAN 
Bell Telephone Laboratories, Murray Hill, New Jersey 


1. SUMMARY 


Now and then in the literature one finds results relating to multivariate distributions which 
are derived virtually independently of, or with considerable effort from, the corresponding 
univariate relations, whereas they are in fact only very mild generalizations of the latter, 
as will beshown. Only the familiar concepts of moments, characteristic functions, cumulants, 
and k-statistics and their sampling cumulants will be discussed here. It should be emphasized 
that these concepts are identical with those ordinarily used in multivariate situations; the 
only novelty lies in the concise manner of representing and handling them. 


2. ELEMENTARY APPLICATIONS OF TENSOR NOTATION 


A vector random variable at its simplest is represented by a letter with a single undetermined 
subscript; thus (x;) = (2,,...,,). It could consist of a sample of p values (independent or 
not) of a single variable, but this is a special application which is not intended in general. 
If it is convenient the individual variables may carry several subscripts, as in a model for 
an analysis of variance or in representing a sample of a vector variable; results for single 
subscripts are easily adapted to this case. If the vector notation is generalized by allowing 
more than one variable index, tensors are obtained, while ordinary variables or numbers, 
with no variable index, are called scalars. In the tensor notation E(2}2;) may be represented 
by /4;;;, which is non-committal regarding the range 1 to p of the indices. This fact gives it 
much greater generality than is possessed by the alternative ‘ power’ notation which, with 
p = 3 for example, writes E(x?x3) = j99,;. (The brackets will be used here to distinguish 
the notations.) The latter is admittedly more compact in representing high-order moments 
of a very few variables, writing /;,;) in place of /¢,;;20000, ete. Of course, it is always possible 
to specialize a tensor formula by making certain of the subscripts equal. 

In the tensor notation there is nothing esoteric about the notion of products and powers 
of vectors and tensors; (u;;)(v,)? is simply u,;v,,v,, which, of course, may also be written 
v,,0,U;; or in any other order, and is a special case of the general tensor 7;;,, of rank four 
The non-commutativity of vector multiplication which must be assumed when tensor 
indices are not used, disappears when they are introduced, because the indices themselves 
take care of the matter. The concept of symmetry replaces that of commutativity, so that 
a,b; = a,b, is the equivalent of ab = ba. 

Since the moment 4, ... ;, is defined as E(x, ... x;,), it is obviously completely symmetric. 
In tensor notation the moment-generating function 
Efexp =z; t*} 
may be written 
f(t‘) = 14+ Dati+ 5, Dist! + a >> Hagel O + ists 
i *4,J i,j,k 


* Prepared in connexion with research sponsored by the Office of Naval Research. 
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the superscripts being indices, not exponents. Taking the logarithm of ¢(¢*) (which is, of 
course, a scalar-valued function, though its argument is a vector) gives the cumulant 
generating function as a series of algebraic forms of ascending degrees in the ¢‘, whose tensor 
coefficients are the tensor cumulants themselves. There is no loss of generality, but a great 
gain in convenience, in assuming as usual that these tensor coefficients (the cumulant 
tensors) are symmetrical. Finally, since the cumulants are symmetric tensors, the same 
must be true of their estimators, the k-statistics. 

Symmetry in a tensor is an important property, as it permits the full tensor equation to 
be inferred uniquely from the corresponding scalar equation (p = 1) in the cases to be 
considered here. The proof by undetermined coefficients is obvious if one notes that each 
of the tensor subscripts (assumed distinct) must occur exactly once in each term, and the 
coefficient cannot depend upon the indices since these are arbitrarily assignable labels. For 
example, from the scalar relations 


Ky) = Aap Kt = 2! — fy, Ki3] = 3) — 3/4x Ma) + 2peh), 
9,2 4 
Kia) = tai — 4/41) — 3ffa + 127/42) — 644), 


we may derive the tensor equations 


Ki = My Kig = hig — iy | 
Kis = Mig — (Mabie + Mj bi + Maclig) + 2G 1G Me (1) 
4 3 6 
Kija = Miga— Bi lyia— X Pig ya t+ 2D Gy Ma — 5g Mal 


The last equation is equivalent to five scalar formulae, those for kj44), K(e11) Kig2)) Kigy) and 
Ki4- The summations are over the possible ways of grouping the subscripts, and the number 
of terms resulting is written over the ©. In generai, the numerical coefficient is (— )?-1(P—1)!, 
where P is the number of j’s occurring in the wr <}uct. In tae inverse tensor formulas for the 
/e’s in terms of the x’s, all coefficients are + i. ‘nm short, symmetry of the tensor is a sufficient 
and apparently also a necessary condition t'.. che censor generalization of scalar equations 
of the type considered be a trivial proce«s. 

Similarly, if s;; , denotes the products 2,2;... 2, summed over the sample, the tensor 
formulae for the k-statistics may be shown to be as follows: 


= 
ll 


i 8;/n, k;; = (ns;;—8,8;)/n™, 


3 
Kise = (2°85, — ¥ 8,354 + 28;8;8,)/n®, (2) 
4 3 6 
Rigg = [n?(n + 1) 85544 —2(n + 1) Y 8,844 —n(n — 1) D558) + 2N Y 8,8; 8, — 68;8;8,8,]/n™. 


3. SAMPLING CUMULANTS OF k-STATISTICS 


Wishart (1929) and Fisher (1930) early considered the generality which could be achieved 
by admitting into a statistic a number of chance variables equal to the degree of the 
statistic, and which is equivalent to the generality of the tensor notation. They seem to 
have regarded the method as usually too cumbersome. However, it is possible at least to 
write down the tensor forms of the sampling cumulants of k-statistics in little more space 


than the one-variable case requires, and with little additional calculation, as will be shown. 


name wo Save k(ab, ij) = El (kup — Kap) (kj; —K;;)] 


= KabrjlM + (Kai Kpj + KagKpi)/(n— 1). (3) 











3 0 ~w~ OO VY 


RB eft OO +. FS 


Vv Vw VY BE 





E. L. Kapitan 321 


This is equivalent to seven types of scalar formulae, as follows; since the subscripts are of 
the ‘power’ type, they would be bracketed but for typographical reasons. They comprise 
the variance of a variance or covariance: 


vark, = k,/n+2«3/(n—1) (a=b6=i=)), 
var ky) = Kyq/M + (KogKopt+Khy)/(m—1) (a = 1,6 = 9); 
the covariance of two variances: 
COV (kag, kg) = Kyq/n + 2x3,/(n—1) (a = 6,4 = 9); 
the covariance of a variance and a covariance: 
COV (kyo, ky1) = Kg,/2 + 2Ky,Kyq/(N—1) (a= 6 = 1), 
COV (Kz99> Kor) = Kerr! + 2ky 9K yo:/(N—1) (a = 8); 
and the covariance of two covariances: 
COV (ky19; Ky91) = Keri! + (K290Ko11 + K110%101)/("— 1) (a = 2) 
COV (Ky1003 Koos) = K1111/% + (K1010Ko101 + 1001 Xo110)/( — 1). 
After the indicated identifications have been made among the subscripts, the writing out of 


the special cases involves a mere change to the power notation. 
Some additional formulae are the following: 


6 
(ab, 7k) = Kapijxl +X Kai Kojxl (m— 1), (4) 
8 6 
k(ab, ijkl) = Kanijxil + (3 Kai Koja + KaijKo)|(” ~ §), (5) 


12 4 
k(ab, yj, Pq) a Kabiipal™ +z Kabip Kjq/n(n —1)+ “Kap Kpjq(” 2)/n(n a: 1)? 
8 
+ ¥ Kai kop Kjql (nm — 1), (6) 


9 9 6 
(abe, ijk) = Kapeijk|® +(SKaikvei~ + ZX Kani Xejx)|(m— 1) + DK aikyjKexn/(m—1)(n—2), (7) 
12 12 18 
k(abe, ijkl) = Kapeijxal +> Kai Xpejtal (n — 1)+(X Kanikejna + >> KaijKoex)/|(” -1) 
36 
+ YD KaikojXcan/(m — 1) (n— 2), (8) 


" 24 32 \ 
(ab, 1), PI, uv) = Kabijpquel™ +2 Kai KpjpquelN*(n ~ 3+ UKaip Kpjque(® ae 2)/n?(n ie 1)? 


8 24 
+ » > Kaipu Kpjquin™ al 3n + 3)/n?(n fis 1)8 + DK avpuXijqu/N*(n = 1) 








96 48 r (9) 
+ (Lk aiXopXjque + X Kai*puXpjqu)/n (m— 1)? 
+ > KaiKopuX jqo(% — 2)/n(n — 1)® + S keehyp KauX val (m— 1), [ 
K(abed, ijkl) = Karcaijnal® + S kas Kocanal( — a+ S Kane Kean ( = 8) 
+ S kas Kog Kean ( —1)(n—2)+ (S katie Kea + >> Kapij Kcax)|(m — 1) | (10) 
+ z KaiXpejXara”|(m — 1) (n — 2) 
+ Bk gchoykeakgn(n+ 1)/(n—1) (n—2) (n—3). } 


Equations (3), (6) and (9) embrace some nineteén formulae given by Miss Cook (1951), 
but some care is required in deriving particular cases. 
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These tensors are only partially symmetric, as the grouping of the subscripts is intended 
to indicate; hence they cannot always be deduced from the univariate forms as given, for 
example, in Kendall (1948, pp. 268-74). However, it may be noted that the two sets of 
formulae differ only in that pattern functions* from different arrays are never merged in 
the tensor formulae, and the numerical coefficients do not occur as such, but represent 
instead the number of terms occurring in a symmetric group of terms; they have been 
written above the summation signs. The group is composed of all distinct terms obtain- 
able by any combination of permutations of subscripts within their sets (marked off by 
commas), and of permutations among sets of equal size. The required pattern functions 
may be obtained from Fisher (1930, pp. 224-6). 

The formulae in which one or more sets contain only one subscript are obtained in 
a manner analogous to the usual one, by affixing the subscript in every possible position and 
dividing by n; thus 

K(i,j, kl) = Kijy/n® = x(t, k, jl), ete., 


6 6 

k(ab, ijk, p) = Kapijxy|N® + (X Kaip Koj + X KaiXvjxp)/m(n— 1). 
The result is that every sampling cumulant in which at most one set contains more than 
one index is equal to a population cumulant divided by a power of n, and is therefore 
a symmetric tensor. 

The references cited include examples of the derivation of bivariate and trivariate sampling 
cumuiants, in which multipartite partitions replace the simple ones. Here again a change of 
notation makes Fisher’s method as easy to apply to the tensor as to the univariate case. 
Instead of representing the multipartite partitions as ordered sets of zeros and ones, we 
write down those indices to which the ones belong. The subscripts can be allocated in only 
one way; hence, the numerical coefficients are all unity as stated before, and only the 
pattern function remains to be determined exactly as in the univariate case. For example, 
the array corresponding to the term k4; ky, Kjguy in K(ab, ij, pq, wv) in the multipartite and in 
the tensor notation respectively is 

[10, 00, 00, 00] [00, 10, 00, 00} sus [10, 10, 00, 00] 
(01, 00, 00, 00] ~ [00, 00, 10, 00] - [01, 00, 10,00] 
-~ [00, 01, 00, 00] [00, 00, 01, 00] [00, 00, 00, 11] | [00, 01, 01, 11] 


ae oo : a 
, |, bp 


. 


J 9 w | jquo 


(ab) (ij) (pq) (we) | 
The pattern of non-zero entries is 





x 
x . x 
x x x 


Of course, the 95 other terms of the group have the same pattern function 1/n(n—1)?. 


* For an explanation of this and the various following references to Fisher’s rules for the derivation 
of these formulae, see Kendall (1948, pp. 261 et seq.). 
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According to Fisher’s rules, the following three types of arrays make no contribution to 
the result (the corresponding terms do not appear): 

(1) those containing a row having only one element; 

(2) more generally, those whose elements are divisible into two arrays lying in different 
rows and having only one column in common; 

(3) those divisible into two arrays lying in different rows and columns. 


Statistical reasons are adduced (Kendall, 1948, pp. 276-9) for the first two, while a somewhat 
difficult direct proof is given for the last type, which does not, in fact, have a vanishing 
pattern function but is ignored nevertheless. The tensor method shows conclusively that 
terms of the third type cannot be present. One assigns to each index a distinct fixed value, 
corresponding to a particular random variable. If the arrays of this type are considered in 
a suitable sequence, it is found that given any such term 7’, the random variables may be 
grouped into sets (as they are grouped in 7’) such that if variables in different sets are assumed 
to be statistically independent, then both the sampling cumulant and all of its terms 
(except 7') are known either to vanish or not to occur in the formula. Since T itself need not 
vanish, it cannot occur in the formula. Thus, to show that 7’ = k,yk,qK;j;Xjq does not occur 
in (10), assume that x, and x, are independent unless wv is one of the pairs ab, cd, 14, kl; 
then to show that K,),«,¢*;;, does not occur assume only that the sets ab, cd, ijkl are 
independent, etc. 

Apparently this argument would not apply to sampling from a finite population. It may 
be noted that Kendall (1940) has devised a method for deriving multivariate sampling 
cumulants from univariate ones by means of symbolic operators. It does not fully display 
the unity of the problem, however, and the examples do not go beyond bivariate dis- 
tributions. The formulae for sampling from a finite population (see Irwin & Kendall, 1944) 
could be similarly generalized to tensor form. 
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RANK ANALYSIS. OF INCOMPLETE BLOCK DESIGNS 
I. THE METHOD OF PAIRED COMPARISONS 


By RALPH ALLAN BRADLEY anp MILTON E. TERRY* 
Virginia Agricultural Experiment Station of the Virginia Polytechnic Institute 


1. INTRODUCTION 


The analysis of experiments involving paired comparisons has received considerable atten- 
tion in statistical methodology. Thurstone (1927) has considered the problem on the assump- 
tions that a linear variate is involved and that perceptible differences exist among the items 
presented for comparison. More recently, Mosteller (1951a,6) has elaborated upon Thur- 
stone’s method and, having postulated a sensation continuum over which sensations are 
jointly normally distributed, has developed a y? test following transformation of the observed 
variates. 

Kendall & Babington Smith (1940) proposed a method of analysis for paired comparisons 
which does not depend on assumptions of a linear variate or of normality, and the procedure 
may be described as a combinatorial type test. They form a coefficient of agreement which 
essentially measures discrepancies from perfect agreement, although the model used in 
the test is not explicitly formulated. In subjective tests the consistency of a judge is mea- 
sured in terms of circular triads. We note that tests of consistency and tests of agreement, 
when differences are known to exist, may also be considered to be tests of null hypotheses 
upon the postulation of absence of differences. 

Guttman (1946) has developed a method of quantifying paired comparisons. His problem 
was to determine a numerical value for each of a number of items which will best represent 
the comparisons in some sense. The problem may be considered to be one of estimation as 
distinct from the problem of testing hypotheses. 

When only two items are to be compared in a ranking experiment, a test of the hypothesis 
of no-difference between them on some characteristic may be based on the binomial dis- 
tribution. The estimation of the probabilities that the items are superior in a given com- 
parison may be accomplished, and these estimates afford a method of rating the items or 
a method of quantification. In the present paper a generalization of the binomial model 
and distribution is obtained. 

A mathematical model is formulated and maximum-likelihood estimates of treatment 
ratings provide a simple solution to Guttman’s problem of quantification. Likelihood ratio 
statistics are used for tests of a specified class of hypotheses. Although these tests basically 
agree with those of Kendall & Babington Smith, they subdivide the possible results from 
an experiment of a given size into more distinct subclasses, thus perhaps indicating better 
sensitivity. 

The test procedure is flexible. In subjective testing the experimenter may assume 
a priori that the standards of judging are uniform or that they vary by judges, by time, or 
by both. That is, true treatment ratings may be considered to be constant throughout an 


* This project was supported by funds from the Research and Marketing Act of 1946, under 
Contract No. A-1s-32683, with the Bureau of Agricultural Economics. 
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experiment or to be functions of judges and of time. In experiments concerned with the 
detection of treatment differences the latter alternatives are important. Certain test 
characteristics may be rather intangible and difficult to describe, with the added difficulty 
that personal preferences may influence judges’ decisions. Even in simpler cases the research 
worker may desire to forgo the tedious procedure of training and co-ordinating judges. 
In these situations tests of treatment differences may be performed and a measure of the 
agreement among judges obtained, although estimates of over-all treatment ratings are 
not usually meaningful. 

Tables for the test procedures for small treatment and sample sizes are provided and 
asymptotic distributions are considered. The use of the tables and the method of analysis 
are illustrated with an example of a taste-testing experiment on pork roasts from animals 
fed on one of three corn rations with peanut supplemements. 


2. MATHEMATICAL MODEL 


Let us consider ¢ treatments in an experiment involving paired comparisons. We shall first 
consider that these treatments have true ratings (or preferences), 7, ...,7,, on a particular 
subjective continuum throughout an experiment. The continuum is specialized by the 
requirements that every 7; >0 and that &7; = 1, the latter condition being added for con- 
venience. Further definition follows with the assumption that, when treatment i appears with 
treatment j in a block, the probability that treatment 7 obtains top rating (or a rank of 1) 
is 7;/(7;+7,). Later generalization will!require the addition of a second subscript on the 
parameters indicative of judges or time. 

Now r;;, will designate the rank of the ith treatment in the kth repetition of the block in 
which treatment 7 appears with treatment j. Clearly 7;;, = 3—7r;,,. Estimates of 7, ..., 7 
will be denoted by 9,, ...,, respectively, and n will be reserved to denote the number of 
repetitions of the design when a repetition is defined to be a set of all pairs of treatments. 

In certain cases, as noted above, repetitions of the design may be performed by different 
judges or at different times. We shall discuss, in a subsequent section, the analysis when 
true ratings 7,,,, ..-,7, exist in the wth of g groups not necessarily identical from group to 
group. 

3. THE LIKELIHOOD FUNCTION 


We may now obtain the likelihood function, assuming probability independence between 
blocks or pairs of treatments. Consider the probability of the observed rankings in the (th 
repetition for the block in which treatments i and j are compared. The probability of the 


observed result is 
1, +715) 1+; (7; +75) ‘ 


For if the ith treatment obtains top ranking, 7;;, = 1 and 7;,, = 2, and the expression above 
becomes 7,/(7;+7;); alternatively, 7;;, = 2, rj, = 1, and the probability is 7,/(7;+7;). 
Multiplying the appropriate expressions for all comparisons within a repetition and for all 
n repetitions, we reach the likelihood function in the general form 


L= II mend D— 2 erik Tl (47; +7;)™. (1) 
1 i<j 








When repetitions of the design are performed by groups with distinct parameters, the 
likelihood function will be a product over the groups of functions of the form of (1). 
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4. LIKELIHOOD RATIO TESTS AND ESTIMATION 
A general class of tests of the null hypothesis, 
H,:7,=1/t (¢=1,...,¢), (2) 
against alternative hypotheses 
H,:1,=mh) (h=1,...,m); 


t+=8, ,+1,...,8,, Where 8s,=0, s, =t and > (s,—s8,_,)m(h) =1, (3) 
h 


are possible using likelihood ratio tests. That is, tests of the null hypothesis of identical 
treatment ratings may be performed when the alternative hypothesis specifies that the 
treatments have identical ratings within each of m groups of treatments while the groups 
themselves may differ. Alternative hypotheses involving only a subset of parameters do 
not lead to parameter-free tests. Two special cases of this general class of tests will be 
considered in the next section. 

If p(A) is the maximum-likelihood estimate of 7(h), these estimates are obtained from the 
equations 


| [2m — 1) (8, — 8,1) - > D DV rize — $Sp — 8p-1) (8p — Spy — 1)| [pay | 


i=s—.t1 j+i k 


~M8q— Spa) 3 (8y— 8a) P(A) + PLS} =0 (h=1,...,m), (4) 
and & (81 — 81-1) Pl) =1. (5) 


Equations (4) are obtained from the reduction of the first-order maximizing conditions on 
the logarithm of the likelihood function when a Lagrange multiplier is used for the restraint 
on the parameters (3). 

The general test statistic,* a monotone function of the likelihood ratio, is 


B= nz (8, — 8,_4) (87 — 8/_1) log {p(h) + p(f)} 


= ~ [2n(¢— 1) (8, — 8,1) — > D> Disk $n(8p, — 8)_1) (8, — 8p —- 1) log p(h). (6) 


t=s,_,+1 j+i 
B is implicitly a function of the treatment sums of ranks. 

Solution of equations (4) and (5) provides estimates of the true treatment ratings. Pair- 
wise comparison of these estimates provides a quantitative measure of the ratings of a pair 
of items relative to the test attribute. 

The estimates p,; of 7; may be used for pairwise comparisons of treatments in the sense 
that the ratio p,/p; measures the relative frequency of occurrence of rank | for treatment 7 
as compared with treatment j for this particular paired comparison. If the estimates are 
converted to logarithms, the values log »; occur on a linear scale and permit over-all com- 
parisons of the experimental treatments. Any consideration of differences among treatments 
should be based on the values of the log p,’s. 


5. SPECIAL TESTS 
Two special alternative hypotheses are of particular interest. 


Case (i). H,:no 7; is assumed equal to any 7; (i +7). That is, in the general hypothesis 
(3), m = f. 
* We use logarithms to base 1U unless otherwise specified. 
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In this case equations (4) and (5) are simplified and become 


2 ‘_n y (pet p;)~ = 0 (t,9 a US «0c568)s (7) 
i it 
and =p; =1. (8) 
i 
We define a, = 2n(t—1)- > > "isk (9) 
j+ik=1 


Both equations (4) and (7) contain one degree of dependence and imply respectively equa- 
tions (5) and (8). 
The test statistic becomes 


B, =n > log (p;+p;)—X [2m(e- ‘~~ 2s S rs log p;. (10) 
w<J i j+ik=1 


The preparation of tables for the exact distribution of B, is discussed in the following 


section. 
a ; l—s7 .. eo ° 

Case (ti). H,:1, = 7 (t = 1,...,8): 4, = ea (i =s+1,...,t). This is a reduction of the 
general hypothesis to the case in wnich m = 2. 

This special test is similar to certain single degree of freedom comparisons in the analysis 
of variance. It is possible to compare two groups of items so long as all experimental items 
are included in one or another of the groups; however, it should be noted that one may 
always disregard all pairs in the experiment involving one or more extraneous items and 
proceed with tests based on comparisons within any subgroup of items. 

For the special case (ii), the maximum-likelihood equations (4) and (5) may be solved 
simply and the test statistic written as an explicit function of treatment sums of ranks. 
When p is the estimate of 7, we have 


ns(4t—s—3)—2 y > Disk 





p= i=1 j#ik : (11) 
ns( 5st — 2t? — 6s + 3t) — 2(28 —t) > > Vrs 
t=1 j+i k 
and the statistic, by substitution in (6), is 

aly ye =) a ,l | —8)p_ | 

B, = 1 EE rin nals )—2n(t—1)s j 8 GR Is)p +l 
i> yyy ~tt<a~0)~ seed Pe ee el, 9 
+), > int intl 8) (t—s—1)—2n(t—1)(t—s) j $8 a= 2s) p+ if’ (12) 


A discussion of the distribution of B, is included with that of the distribution of B,. 


6. TABLES FoR B, anv B, 


It is possible to generate all combinations of treatment sums of ranks for any given number 
of treatments and repetitions of the paired comparison design. The probability of each such 
combination may be obtained under the null hypothesis of equality of true treatment 
ratings. 

If three items are compared in a single repetition, the possible sets of rank sums are 
2, 3, 4 and 3, 3, 3. Each of the six permutations of the elements of the first set has a prob- 
ability 1/8, while the probability of the second set is 2/8. The treatment sums of ranks 
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for two repetitions and three treatments are obtained by adding 2, 3, 4 and 3, 3, 3 in turn 
to corresponding elements in the sets of sums of ranks consisting of all permutations of 
2, 3, 4 and to 3, 3, 3. In the sets of sums of ranks so produced, all permutations of a given 
set of treatment sums of ranks are taken to be equivalent. The probability of a given per- 
mutation is obtained by multiplying the basic probabilities of the combination and the 
permutation used to produce the given permutation. The probability of a given new com- 
bination of rank sums is obtained by adding the probabilities obtained for each permutation 
of the elements of the combination. 
The procedure may be arranged systematically as shown in Table 1. 


Table 1. The generation of treatment sums of ranks and probabilities 
for three treatments and two repetitions 














Prob- 
abilities 1/8 1/8 1/8 1/8 1/8 1/8 2/8 
Rank 
sums 3 3, 4 2,4,3 3, 2,4 B, 4,2 4, 2,3 4. 2.3 3, 3, 3 
6/8 2, 3, 4 4, 6, 8 re She 5, 5, 8 5, 7, 6 6, 5, 7 6, 6, 6 5, 6, 7 
2/8 3, 3, 3 5, 6, 7 ee 6, 5, 7 & 7,5 7, 5,6 1 aoe 6, 6, 6 











The combination 5, 6, 7, say, appears in its various permutations in nine places in this 
table. In row 1, column 4, for example, 5, 7, 6 appears and its probability is 6/64 obtained by 
multiplying marginal probabilities of row and column. The probability of the combination 
5, 6, 7 is then the sum of the nine individual probabilities and has the value 36/64. When 
three repetitions with three treatments are considered, the generating rows at the top of 
the table are unchanged, but the columns at the left above are replaced by the possible 
combinations of sums of ranks obtained for two repetitions with their corresponding prob- 
abilities. This procedure is continued for larger numbers of treatments and repetitions. 

When the sets of possible combinations of treatment sums of ranks are obtained with 
their probabilities of occurrence, for each such set we may substitute in equations (7), (8) 
and (9) and obtain estimates p,,..., »,. The solution of these equations is tedious; in some 
cases elementary methods are applicable, in others it is necessary to use repeated approxi- 
mations in an iterative procedure. In the later work, many of the procedures have been 
programmed on I.B.M. equipment. When we substitute p,,...,p, in (10), the statistic B, 
is evaluated. 

Tables for the distribution of B, for three and four items and up to ten repetitions of the 
design are given in Appendix A. The possible sets of treatment sums of ranks are given in 
the left-hand columns. Corresponding estimates, p,,...,»,, are then given with the value 
of B,. The final column shows significance levels, P, in the form of cumulative probabilities. 
These probabilities are obtained from the individual probabilities of the possible sets, 
accumulated beginning with small values of B, which are most discordant under the null 
hypothesis. 

The distribution of the statistic B, may be recovered from tables for B,. When ¢ and n 
are specified, it is easy to compute values of p and B, using (11) and (12). Probabilities may 
be obtained by elementary considerations. 

B, is no longer symmetric over the treatments, and certain permutations of treatment 
sums of ranks must be considered for each entry in the tables of Appendix A. However, 
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each such permutation is equally likely, and its probability may be obtained from the 
cumulative probabilities for B,. (Sets grouped together with equal values of B, always have 
equal probabilities.) 

For any hypothesis for which B, is the appropriate statistic it is possible to evaluate p, 
B, and the corresponding probability of each value of B, as obtained from the distribution 
of B,. Tables for the distribution of B, will be prepared at a later date and are not presently 
available. 

7. THE COMBINATION OF EXPERIMENTS 


As noted in the introduction, it may happen that an experiment is performed in groups* 
g 

of repetitions of sizes, n,, (u = 1,...,g), with ¥n,, = . Two possible methods of performing 
1 


an over-all test of significance are available and depend on the specification of the alter- 
native hypotheses. We shall illustrate these methods with reference to the important special 
test (i), noting that similar procedures may be developed for all tests of the general form 
specified in § 4. 
(i) Pooled analysis 

If an experimenter is willing to assume that true treatment ratings, 7,,...,7,, exist as 
the alternative hypothesis for all groups of repetitions, no new analysis is required. Total 
treatment sums of ranks are obtained by addition of corresponding group treatment sums 
of ranks over the g groups. The experiment is treated as though one group of n repetitions 
of the design had been employed and the tables of Appendix A may be used. 


(ii) Combined analysis 
In many cases the alternative hypothesis that the same true ratings exist for all groups 
is not realistic. If the detection of treatment differences is the main concern of the experi- 
menter, a pooled analysis may be inappropriate and even give a non-significant result, 
while each group alone exhibits significant treatment differences. This is particularly likely 
to happen where judge preferences may prohibit the setting up of uniform ranking criteria. 


Let us specify an alternative hypothesis as follows: 
t 
(a) Within the wth of g groups, true ratings 7},,, ..., %, 7, = 1 exist, and these ratings 
i=1 


may change from group to group. 


(6) Group experiments are independent in probability. Then, in addition, we define 
bi to be the likelihood ratio statistic corresponding to B, (6) for the wth group (u = 1,...,9). 


The statistics BY are self-weighting. That is, the groups may be combined and an over-all 
test of significance performed depending on a statistic 
e, 
Bi = & By. (13) 
u=1 
This statistic is again a monotone function of the likelihood ratio and does not depend on 
values of n, other than in the evaluation of By. 
Tables for the distribution of Bf are discussed in the following section. 
One note should be added. The decision to pool or combine group results should be made 
from a priori knowledge of group behaviour. When group data may be pooled, it is possible 
* These groups may represent judges in sensory difference experimentation, different localities, 


days, or non-treatment experimental techniques. 
Biometrika 39 22 
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that the pooled value of B, exhibits higher significance than the corresponding value of 


Bi. However, it is easy to show that B,> B (14) 
in every situation. 
Estimates of the parameters, )),, ..., p,, Should usually be obtained by groups when groups 


are combined, but over-all estimates are available when groups are pooled. 


8. TABLES FOR Bf 


The probability of a specified value of B{ may be simply obtained by elementary probability. 


ye 
Suppose values of n, are equal in sets of sizes gj, ...,¢ w 9: = 9, and that values of BY 
1 
Uy 
within these sets are equal in subsets of sizes gj1,.-.,9iv,» LX 9ij =9i (0 = 1,-.-,w). The 
j=1 
probability of a specified value of Bj is 
w wy -1 g 
PB) = I g!{ Hou!) It PUBY. (15) 
i= j= u= 


Values of Bi and P(By) may be obtained from the table of Appendix A. Bf is calculated 
by addition as in (13), and its probability is evaluated by use of (15). 

Using the results above, we have computed tables for Bf for certain experiments wherein 
there are equal numbers of repetitions in each group. Only values at approximately the 
0-10 level of significance or higher have been recorded, and these are selected for easy 
interpolation. These tables are shown in Appendix B. 


9. A COEFFICIENT OF AGREEMENT 


A measure of consistency of ranking from group to group is naturally provided by the 
difference between the pooled value of B, and Bf. Small values of B,— Bf (note that 
B, — Bi > 9) will exhibit good agreement in ranking from group to group, while large values 
indicate discordant rankings. 

If we set up the hypotheses 


H,:m,, =, (w=1,...,.g;i= 1,..., t) 
; (16) 
and A, : 7, (u = 1,...,9:7 = l,...,¢) unrestricted by groups, 
then —2log,A = 2(B, — Bf) log, 10, (17) 


where A is the likelihood ratio statistic for comparison of H, and H,. B,— Bf is then a 
monotone function of the likelihood ratio statistic. 

The distribution of B, — Bf for small samples will depend on parameters 7, ...,7, under 
H, and is therefore not a parameter-free test. A conditional test, which is exact, may be 
formed and has some value. Suppose £{ is fixed at the observed value. Corresponding to Bf 
we have group sums of ranks. If there isno agreement from group to group, all permutations 
of group sums of ranks are equally likely, and for each permutation a pooled value of B,, 
and consequently the difference B, — Bf, may be obtained. Thus for fixed Bg the distribution 
of B, — Bj can be derived under an assumption of no agreement from group to group. This 
conditional test reverses the hypotheses (16), and small values of B, — Bf show significant 
agreement from group to group. 


A large sample test of the hypotheses of (16) is discussed in the following section. 
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10. LARGE SAMPLE DISTRIBUTIONS 


If A is the likelihood ratio, it is known (Wilks, 1946, pp. 150-2, § 7-2) that — 2 log, A is dis- 
tributed as y? under very general conditions. This result can be employed in the special 
cases considered above. 


Case (i). In the first special test (§ 5) 
— 2log, A, = nt(t— 1) log, 2— 2B, log, 10 (18) 


is distributed in the limit as y7_;, i.e. as x? with ¢— 1 degrees of freedom. (It has been noted 
that B, as tabled is a linear function of logarithms to base 10.) 

The authors have been unsuccessful in an attempt to evaluate the moments of — 2log, A, 
by theoretical methods. However, numerical values of the mean and variance of this 
statistic have been computed for small numbers of items and repetitions. These are given 
in Table 2. 

Table 2. Mean and variance of — 2 log, A, 




















t=3 t=< 
n Mean Variance Mean Variance 
1 3°12 3°24 4°55 9-96 
2 3°39 7°27 3°59 9-51 
3 2-80 7:50 3°33 7-80 
4 2-54 6°51 3-22 7:10 
5 2°40 5-83 3°13 6-66 
6 2-32 5-38 — — 
7 2°27 5-15 — — 
8 2-23 4:95 _— — 
9 2-20 4-82 — — 
10 2-18 4-71 — — 
foe) 2-00 4-00 3-00 6-00 
| 




















It may be observed that even for these numbers of items and repetitions there is definite 
evidence of rapid convergence to the limiting values for the means and somewhat slower 
convergence for the variances. For small samples, on the average — 2 log, A, will be a little 
too large, and use of the large sample approximation will tend to lead to the announcement 
of too many significant results. The approximation appears to be fairly good for practical 
purposes if the number of repetitions is not too small (say n > 15). 

We may note that the computations are fairly difficult if the approximate test must be 
used. To compute B, it is necessary to solve equations (7) and (8) and substitute in the for- 
mula (10) for B,. The equations are most easily solved by obtaining a first approximation 
by comparison with available tables (multiples of =7,,..., Zr, yield identical estimates 
Pi, ---») and using an iterative procedure. The iterations consist of obtaining second 
approximations such as 

Re Wiggs #4 ee eg gw y 
(PP +p? PRitDe. PP+pP P+ pe 





where the superscript in parentheses indicates the order of iteration. 


22-2 
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For the combined test, from the additive property of x?, the limiting distribution of 
: g 
—2log, Af = —2 > log, Ay 
u=1 
= nt(t— 1) log, 2—2Bflog, 10 (19) 
is that of x? with g(t—1) degrees of freedom. The notation in (19) corresponds to that 
of §8(b). 
If we consider the test specification (16), a parameter-free test of agreement may be 
formed for the large sample distribution. It follows that 2(B,—Bf)log,10 has the y*. 
distribution with (g— 1) (t— 1) degrees of freedom in the limit. Large values of this statistic 


show discordant ranking from group to group. 
The large sample test may be summarized as in Table 3. 


Table 3. Large sample analysis 
(Note that log, 2 = 0-69315 and 2 log, 10 = 4-60518) 


























Statistic Hypotheses —— ~ 
YY 2 
nt(t— 1) log, 2—2B, log, 10 (one ar x 
1°77; 

Ho:%y, =™% 2 -1y(t- 
2(B, — Bj) log, 10 i Xo-1xt-1) 
nt(t— 1) log, 2— 2B¢ log, 10 aoa aie Xae-1) 

Case (ti). In the second special test, the statistic 
— 2log, A, = 2ns(t— 3s) log, 2— 2B, log, 10, (20) 


has in the limit the distribution of y? with one degree of freedom. 


11. THE EXPERIMENTAL PROCEDURE AND ANALYSIS ILLUSTRATED* 


In a recent taste-testing experiment, pork roasts were compared by ranking in pairs on 
their flavour characteristics. The roasts were obtained from three groups of hogs which had 
been fattened on three different rations: corn (maize), corn plus a peanut supplement, and 
corn plus a large peanut supplement. The object was to determine whether the addition of 
peanuts to the diet was recognizable in the fresh-pork roasts or not. One would like to ask 
expert judges to rank pairs on the basis of flavour attributable to the peanut diet; however, 
this characteristic proved too intangible to define, and each judge was asked to rank pairs 
on the basis of his own preferences. This leads a priori to a combined analysis (§ 8) for the 
experimental data. 

When a new procedure is proposed, it is useful for applied work to show a systematic 
listing of the steps involved. We now indicate these steps with reference to the results of 


* The illustrative example is taken from preliminary experimental results of L. L. Davis, C. M. 
Kincaid and H. R. Thomas at the Virginia Agricultural Experiment Station. 
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two of the several judges used in the experiment described above. Each judge performed 
five repetitions of the paired design (t = 3, n = 5). 


Procedure 


Step 1 (experimental). A competent panel of judges was selected and so instructed that 
they all had experience with the experimental material. 

Step 2. For each judge and for each repetition six small containers were coded. Two 
samples from roasts from each of the three treatment groups of animals were placed in the 
containers and the three requisite pairs formed. Code numbers were recorded and the pairs 
presented to the judges in a random order together with score cards. 

Step 3. For each pair a judge tasted each sample and recorded the value 1 for the sample 
preferred and 2 for the other sample. 

Step 4 (analysis). The experimenter collected and decoded the data for each judge and 
recorded the results as follows. C denotes the corn ration, Up the corn plus peanut supple- 
ment ration, and CP the corn plus large peanut supplement ration. The treatment sums of 
ranks, &r,, for C, Cp, CP are respectively 19, 13, 13 and 13, 15, 17 for the two judges. 


Table 4. Rankings for two judges in the pork experiment 









































Repetition ... 1 | 2 3 + 5 
| 
© Cp CP | C cp cP | c cp cP | Cc Cp cP | C Cp CP 
a | Loe 
Pair Judge 1 
| 
C, Cp i ea tk ie a 9 oe 2 di i 
C, CP A —- 9 1 — 2 2 — 1 2 — 1 2 — 1 
Cp, CP an | = t 3 an as = Ss 7. oe 8 
Judge 2 
C, Cp 2 1 — | 2 1 — 1 2— 1 2 — 1 2— 
C, CP a ee a 1 — 2 1 — 2 2 — 1 
Cp, CP — 1 oh & 2 — 2 — 2 1 — 1 2 
| 











Step 5. Since it was agreed that the results of the two judges should be combined, we enter 
the table of Appendix A at x = 5. For judge 1, p., = 0-05, poy = 0-47, pop = 0-47, B, = 2-917, 
the significance level is 0-057; for judge 2, po = 0-53, pop = 0-30, pop = 0-17, By = 4-034 
and the significance level is 0-404. 

Step 6. The combined statistic Bf of equation (13) was obtained and has the value 
2-917 + 4-034 = 6-951. From the table of Appendix B under the two equal groups, n = 10, 
the significance level for the combined test was found to be 0-069. It was concluded that it 
had not been demonstrated that ration differences detectable by these judges were present 
at any usual significance level. 

Step 6a. If a decision to pool the data had been made, treatment sums of ranks added 
over the judges would have been 32, 28, 30, and the table of Appendix A would have been used 
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for n = 10. We would have found pg = 0-24, pop = 0-43, pop = 0-32, B, = 8-797 and the 
significance level would have been 0-630. Since it seems extremely unlikely on the basis of 
this method that treatment differences are present, it is not here meaningful to compare 
treatments by use of their estimated ratings. 

B, — Bi = 8-7973 — 6-9516 = 1-8459 and is indicative of poor agreement of the preferences 
of the two judges. In fact use of the large sample approximation (Table 3) gives x? = 8-50 
with 2 degrees of freedom, a result significant at the 2 % level. 


12. Discussion 


The authors have not attempted to obtain the power of this rank-order test procedure. The 
method is clear, but any consideration of exact power would require tables for each specified 
set of parameters of the alternative hypotheses of substantially greater complexity than 
those for the null distributions. In addition, the simplifications due to symmetry over 
treatments in certain null cases would disappear. The merits of the test procedure are then 
dependent on the properties of the maximum-likelihood methods used. 

Experiments using the above methods at the Virginia Polytechnic Institute and else- 
where have been satisfactorily conducted. The simplicity and appropriateness of the 
experimental design, together with the simplicity of the analysis, wherein one has only to 
add small integers and consult prepared tables, appear to be important factors in the appeal 
of the methods. The comprehensive tables already prepared are easy to read, and the 
extension of these tables is proceeding. New computing equipment is expected to speed 
the tabling work. 

One of the questions asked in connexion with this work pertains to the possibility of 
extending the analysis to incomplete block designs with larger block sizes. We are proceeding 
with a consideration of such extensions. The method of paired comparisons becomes in- 
efficient where it is possible to rank more than two treatments at a time and where more 
than a few items or treatments are considered. 

Apart from the application of the theoretical considerations for the methods of this 
paper, it is to be observed that the probabilities of the tables of Appendix A may be useful 
elsewhere. Whenever ranking methods are used in incomplete blocks of size 2, tests of null 
hypotheses of treatment equality will depend on the probabilities tabled. The probabilities 
of individual sets of treatment rank sums may be recovered from the cumulative prob- 
abilities given, since all sets bracketed together have equal individual probabilities (that 
is, sets of rank sums with identical values of B, also have identical probabilities). Publication 
of the totality of possible sets of sums of ranks in Appendix A is necessary for use with 


Appendix B, and further desirable in that they may form a basis for future tables for methods 
yet to be devised. 


13. SUMMARY 


A method of analysis of paired comparisons is provided which permits tests of hypotheses 
of a general class and the estimation of treatment ratings or preferences. The mathematical 
mode! developed is simple and easy to interpret and apply. Ranks are used in incomplete 
blocks of size 2, and such ranking will permit later generalization to larger block sizes. The 
method of maximum likelihood is employed and tests depend on the likelihood ratio 
statistics. Two special tests are featured and test the null hypothesis that true treatment 
ratings are equal. The alternative hypothesis (i) makes no assumptions of equality of 
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treatment ratings and (ii) makes the assumption that there are only two groups of treat- 
ments wherein within group treatments do not differ in ratings but the two groups them- 
selves may have different ratings. 

The procedures shown are applicable in most problems where qualitative measurements 
alone are reliable and are particularly useful in problems involving subjective ranking by 
a small panel of judges for the detection of differences. Methods of pooling and of combining 
the results of several judges are given. The method of combining permits an over-all test 
of significance without the usual assumption that members of a panel agree upon the 
nature of the differences to be detected. The decision to pool or to combine is made on the 
basis of a priori knowledge of judge behaviour. If results are combined, estimates of treat- 
ment ratings are usually obtained for judges individually, although average estimates for 
the group of judges may be obtained by reverting to the pooled analysis for this special 
purpose. An example from taste testing is given. 

The large sample distributions of the statistics are discussed, and tables for the exact 
test procedures are shown in the two appendices following. 


In conclusion, the authors would like to express their appreciation to Prof. Lyle L. Davis, 
food technologist, for advice on experimental techniques and for trial experimentation. 
We would also acknowledge the computational and clerical assistance of Mr A. F. Teske, 
Mrs T.S. Russell, Mrs F. A. Spracher, Mrs M. H. Kirkpatrick and Mrs A. L. Ruiz. 
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TABLES FOR THE RANK ANALYSIS OF INCOMPLETE BLOCK DESIGNS 


APPENDIX A. The distribution of the likelihood ratio for general alternatives 


The following table gives the values of the likelihood ratio statistic, B,, and the likelihood 
estimates of the true treatment ratings, p,,...,»,, together with probabilities, P, that B, 
will not be exceeded if the null hypothesis is true. Since low values of B, indicate discordant 
results, P gives the significance level. 

n is the number of repetitions of the design and ¢ the number of treatments. The design 
symbols, ¢ or v, A, 6, r, k are standard and as used, for example, by Fisher & Yates (1948, 
p. 17, Introduction to Table XVIII) and Cochran & Cox (1950, pp. 270 and 304). A in this 
design description should not be confused with the same symbol generally used to indicate 
a likelihood ratio. Parentheses contain combinations with equal values of B,. =r; is the 
sum of ranks for treatment i. 

In setting up the table, several conventions have been adopted to simplify the printing: 
(i) Where p, is unity and the remaining probabilities are therefore all zero the result is given 
as 1 —— or 1 —-———. (ii) The lowest value of B, possible for each n is zero and is printed 
as 0. (iii) Where there are no entries in the final column above a single entry of -0000, the 
corresponding values of P are less than a half unit in the fourth decimal place. 


3 treatments. (Design: t = 3,A = 1,6 = 3,r = 2,k = 2) 
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ApPENDIX B. The distribution of the combined likelihood ratio for general aliernatives 


This table gives selected values of the combined likelihood ratio statistics, Bj, together 
with cumulative probabilities (levels of significance) associated with each tabular value. 
Only groups of equal size are considered. Thus if n is the total number of repetitions of the 
design and three groups are considered, each group contains n/3 repetitions. 
Bi is computed by adding values of B, for each group as obtained from Appendix A. Only 
values significant at approximately the 0-10 level of significance or higher are recorded and 
these are selected for easy interpolation. 


3 treatments. (Design: t = 3,A=1,6=3,r=2,k=2 










































































BS P BS By r BS Pr By P By $F BS a By os 
2 equal groups 4 equal groups 
n=4 | n=6 n=8 n= 10 n=8 n= 12 n= n= 20 
0 -009 | 0-8293 -002| 2-4683 -001| 4-3987 -001] 1-2042 -003 | 4-3281 -001 | 7-9978 -001 | 11-7538 -001 
0-6021 -044| 1-6586 -007| 3-4452 -006| 5-2447 -005] 1-8063 -007 | 5-1807 -005| 8-9950 -005 | 12-7753 -005 
1:2042 -079 | 1-8402 -010| 3-9738 -013| 5-6469 -010]| 2-1005 -019 | 5-5759 -010| 9-4390 -010| 13-2106 -010 
1:4984 -185 | 2-5112 -026| 4-1686 -019| 6-2895 -020]| 2-4084 -023 | 6-0498 -020/| 9-9076 -020| 13-6708 -020 
2-7093 -048| 4-3620 -025| 6-4123 -030] 2-7026 -045 | 6-3897 -030 | 10-0686 -026 | 13-8394 -025 
2-9064 -078| 44424 -034| 6-6731 -041 2-9968 -062 | 6-6266 -039 10-1624 030 | 13-9669 -030 
3.3405 - 45526 -041| 68013 -052 3-0104 -068 | 6-6810 -048 | 10-3965 -040 | 14-1835 -040 
Pers] 4-6696 050 | 6-9250 063 3°3047 -092 | 6-8791 -061 | 10-5843 -059 | 14-3529 -050 
4-9366 065 6-9514 069 3°5989 -159 | 7-0209 -069 | 10-7074 -060| 14-4811 -060 
5-0812 -074| 7-0179 -082 7-0995 -080 | 10-8684 -070 | 14-6057 -069 
7°2578 -105 | 10-9854 -080 | 14-7232 -079 
52422 -086| 7-0751 -089 11-0658 -092 | 14-8393 -089 
5°4652 -123| 7-1656 -107 11-1320 -101 | 14-9368 -100 
3 equal groups 5 equal groups 
n=6 n=9 n= 12 n=15 n= 10 n=15 n= 20 n= 25 
0 -001 | 25112 -001| 53727 -001| 8-0688 -001 | 1-8063 -001 | 6-0498 -001 | 10-7582 -001 | 15-4283 -001 
1:2042 -016 | 2-7093 -002| 6-2191 -005| 9-0642 -005]| 2-4084 -004 | 7-0762 -005 | 11-8177 -005 | 16-4902 -005 
1:4984 -030 | 3-3405 -005| 6-6031 -010| 9-4794 -010] 2-7026 -009 | 7-5103 -010 | 12-2791 -010| 16-9568 -010 
18063 -041 | 3-7357 -O11| 7-0209 -021 | 9-8945 -020] 3-3047 -022 | 8-0715 -019 | 12-8047 -020| 17-4901 -020 
2:1005 -101 | 4:1698 -022| 7-1379 -026 | 10-0456 -025] 3-5989 -038 | 8-2687 -026 | 12-9851 -025 | 17-6727 -025 
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4-7466 -052| 7-6341 -049 | 10-5704 -052]4-2010 -083 | 8-7425 -049 | 13-5254 -049 | 18-2179 -050 
4-9835 -063| 7-8146 -059 | 10-6882 -060] 4-2146 -085 | 8-9174 -060 | 13-6443 -060 | 18-3626 -060 
50224 -071| 7:9079 -068 | 10-8049 -066 | 4-4952 -101 | 9-0592 -068 | 13-7889 -070 | 18-4900 -069 
5°2205 - 8-0181 -083 | 10-9031 -080 9-1543 -080 | 13-9161 -080 | 18-6137 -080 
8-0781 -088 | 10-9594 -090 9-2961 -089 | 14-0263 -089 | 18-7191 -090 
8-1351 -100 | 10-9858 -097 9-3349 -101 | 14-1000 -100} 18-7831 -100 
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STUDIES IN STATISTICAL ECOLOGY 


1. SPATIAL PATTERN 


By J. G. SKELLAM 


The Nature Conservancy, London 


1. INTRODUCTION 


1-0. In the world of organic nature there seems to exist an uneasy balance between the 
factors which increase randomness and those that oppose it. 

This is particularly true of the distribution in space of animals and plants. The broad 
outlines of the pattern are determined by the main structural features of the physical 
environment. But even under constant conditions neither uniformity nor complete 
randomness prevail. 

On the one hand the reproduction of organisms and the interactions between them tend 
to develop a closely knit pattern; whilst on the other, locomotory movements and dispersive 
processes bring about an ever-increasing randomness. An ecological complex of interacting 
species is a dynamical system, which may not only display a regular seasonal rhythm, but 
also appears liable by reason of its intrinsic nature to undergo oscillations (Volterra, 1931) 
or cyclical changes (Watt, 1947), all of which are liable to be disturbed in an irregular manner 
by apparently unpredictable fluctuations in weather conditions or by the spasmodic 
arrival of additional components to the system from outside. 

In order to study the community quantitatively or to assess the densities and abundances 
of living organisms in their habitats, ecologists have found it profitable to sample the 
space in which the organisms occur, and to record the composition of each sample. In this 
first paper we are concerned mainly with the distribution of the number of individuals p r 
sample, for which the term census distribution is proposed. The illustrations given a e 
drawn from observations on plant species for which the most convenient method of sampli ag 
is the marking out of quadrats on the ground. Clearly census distributions are discrete and 
can only be applied to species which consist of individuals or clearly defined aerial shoots 
each of which can be regarded for the purpose as an individual. 

Census distributions are important for two reasons. First, they provide estimates of the 
density of the individuals in the region sampled together with information relating to the 
reliability of the estimates. Secondly, they contribute to our understanding of certain 
aspects of the pattern or arrangement of the individuals in space. 

It is unfortunate, however, that the use of probability generating functions should not 
have featured more prominently in the literature on these and related topics, for by means 
of them the subject under consideration can be given greater unity and understanding. 
Many statistical results already deduced with much labour by the pioneers of quantitative 
ecology can be derived immediately by this method, and the way opened for further 
generalization and development. 
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2. NOTATION 


2-1. G(z) = ¥p,2" is a probability generating function. So are g(z) and y(z). 


A(t) = G(e') = Xy;t"/r! is an ordinary moment generating function. 
O(u) = G(1+u) = Zy~u"/r! is a factorial moment generating function. 
y(t) = log f(t) = Xx,t"/r! is an ordinary cumulant generating function. 


Y(u) = log O(u) = Xx,,,u’"/r! is a factorial cumulant generating function. 


3. PATTERN AND PROCESS 


3-1. Itis universally realized by ecologists that the frequency distribution of the number 
of individuals of a particular species per quadrat is the natural outcome of the spatial 
arrangement of those individuals. If, for example, a number of points are distributed over 
an area in accordance with some scheme of laws (which may or may not involve notions of 
probability) it is possible, assuming sufficient mathematical knowledge, to deduce the 
nrobability distribution of the number of individuals per quadrat laid down at random. 

But though the passage from cause to effect involves no special logical problem, the 
reverse process does. Unfortunately, we cannot with any certainty arrive at an under-— 
standing of the spatial arrangement of the points from a knowledge of the frequency dis- 
tribution alone. One purpose of the present paper is to enumerate some of the more im- 
portant probability distributions to be expected on certain reasonable physical models. 
It will be seen in a number of cases that two fundamentally distinct models may give rise 
to the same probability distribution, and in consequence no statistical analysis whatsoever 
can discriminate between them. 

Fundamentally the limitations are inherent in the method itself. If further advances are 
to be made, the method must be extended and developed so as to incorporate additional 
information of a somewhat different kind. Such can be gained, for example, by employing 
quadrats of different sizes, by studying the relationships between the numbers in nearby 
squares, and the distances between individuals and their nearest neighbours. 

If it is at all obvious from general observation that the arrangement of individual plants 
approximates to some simple model, then it may be possible to give a physical meaning to 
the parameters of the corresponding probability distribution. But if nothing is known 
a priori concerning the spatial pattern, the parameters have only descriptive value in a 
somewhat remote sense, and no great purpose is at present served in estimating them. 

For the same reason J reject the use in this connexion of the Gram Charlier Type B series 
and the.other series developments of § 3-17, at the same time recognizing their value in 
other ways (§ 3-18). 


3-2. Consider a wide expanse of exposed open ground of a uniform character such as 
would be provided by the muddy bed of a recently drained shallow lake, and consider the 
disposition of the independently dispersed wind-borne seeds of one of the species which will 
colonize the area. That the number occurring in a quadrat square marked on the surface is 
a Poisson variate is seen from the fact that there are many such seeds each with an extremely 
small chance of falling into the quadrat. This result was pointed out in analogous circum- 


stances by Student (1907) in connexion with haemacytometer counts, and has been used by 
23-2 
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Blackman (1935) and more recently by Barnes & Stanbury (1951). The distribution of the 
number of seeds per quadrat then has p.g.f. 


G(z) = ee-, (1) 


The probability distribution of the total number of seeds falling into an area is Poissonian 
even if the process takes place irregularly over a long period, for then 


G(z) = TL et = exp{A(z-— ]}, 
j 


where A = XA;. Alternatively, 
- 
G(z) = exp 1 (z— 1) A(t) ae! 


: | = exp{A(z— 1)}, (2) 


- 
where A(t) is a rate, and A= [ A(t) dt. 
J 0 
3-3. Suppose now that the probability that a seed germinates is p and that they are 
not sufficiently packed together to interact at this stage. The distribution of the resulting 


seedling will, by a well-known theorem (Watson, 1889; Fisher, 1922; Haldane, 1927) 
have p.g-f. G( pz+q) = edvete-D = eAve-D), 

Clearly then, as long as the fates of the individuals remain independent of one another, 
the Poisson form persists, and the survivors continue to be distributed at random. 

Though adverse conditions have profound effects on population numbers they do not 
appear to affect the functional form of the more important distributions which we shall 
conside;. 

For suppose that the individuals of a species are exposed independently to the risk of 
extermination with probability g = 1—,p, and we take samples of the space in which they 
live. Instead of the distribution G(z) we obtain G( pz +q). Consequently whenever G(z) has 
the form F(a) +a,z) that functional form is unchanged. This property of functional invari- 
ance under random selection is possessed by all the elementary discrete distributions. 


3-4. For the Poisson distribution K.=A (all r), 


Ky =0 (r>1), 
=A (r=}). 
The ratio 2/“; = 1+,y/kq has been used by Clapham (1936) and other botanists as 
a standard for the comparison of census distributions. In the case of the Poisson distribu- 


tion this ratio = 1, and the individual plants were then regarded as being located at random. 
The plants were said to be over-dispersed when the ratio was greater than unity and under- 


dispersed when it was less than unity. The quantity > (x;—%)?/(z[m—1]) was used as an 
j=1 


n 
estimator of y,/u;. Here % denotes > z;,/n. 
j=1 


The statistic © (x;—Z)?/Z, as originally suggested by Fisher (1925), is commonly used to 
j=l 


test the significance of the departure of an observed set of values (2,, 22, ...,%,,) from the 
Poisson form. This statistic (the index of dispersion) is treated as a x variate with n— 1 





1e 


a 
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degrees of freedom. It is not defined for =0, but as Haldane (1939, p. 350) has shown 
within the conditional set of samples for which Z is constant, the variance is 


2(n — 1) {1 — 1](nz)}. 


A discussion of the adequacy of the x? approximation when the expectation, A, is small is 
given by Lancaster (1952) elsewhere in this issue. 

The problem for small A has been taken up by Fisher (1950) where further refinements 
and modifications are discussed. 

It might be noted that there are serious objections to the unqualified acceptance of 
Clapham’s ratio as an index of non-randomness for it is not in general independent of the 
size of quadrat employed (cf. §§ 3-7 and 3-14; see also Evans, 1952). 


35a. A more direct and profitable approach to the problem of spatial non-randomness 
is to study the distribution of the distance between an individual and its nearest neighbour, 
and to compare the observed distribution with that which could be expected on the assump- 
tion of randomness. 

If the density of the particles = A/7, the number occurring in any circle of area A has 
a Poisson distribution with parameter AA/7. Now take any particle at random as centre 
and construct circles with radii r,; and r,, where r,<r,. The probability that no particle 
occurs inside the inner circle is e~*?. The probability that at least one point occurs in the 
annulus is | -- e~*"i-?, The probability that the nearest point lies in the annulus is then 
exp { — Ar?}[1—exp {—A(r.+7,) (r2—1,)}]. The probability that the distance of the nearest 
point lies in an element dr at a distance r is obtained by allowing r,>7, = r. Hence 


dF(r) = e—” 2Ardr. 
Alternatively z = Ar? has the exponential distribution 
f(z) =e (O0<z<o). 


It follows that if r; (j = 1,2,...,n) are a set of independent values, the statistic 2a ris 
distributed as y? with 2n degrees of freedom. ’ 

3-56. As an illustration consider the following data referring to Plantago major L. in an 
area 2 x 8 metres marked out decause of the apparent uniformity of the vegetation in and 
around it. The area was divided into 10 strips, each 20 cm. wide, and every 10th individual 
encountered in traversing every 2nd strip was chosen as a centre. In this way 35 centres 
were chosen from a total of 723 plants in the area. In some cases, of course, the nearest 
neighbours lay just outside the area. 

The distance from the centre of one plant to that of its nearest neighbour rarely exceeded 
10cm., and was determined with sufficient accuracy by means of a ruler. 

The frequency distribution of Ar? is shown in the figure and compared with the theoretical 
exponential distribution. It is immediately apparent that there are few very small values 
of the variate due possibly to competition between nearby individuals; but there is aggrega- 
tion nevertheless. For in this example the statistic x? = 2AZr? = 47 for 70 degrees of freedom, 
giving P (x? < 47) < 0-02, so that the mean of the observed distribution is significantly less 
than that expected on the null hypothesis of randomness. The reason for this aggregation 
may well be that offspring tend to remain associated together in the neighbourhood of 
their parent. 
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If, of course, all we want is the general shape of the distribution of Ar?, it is not necessary 
for the observations to be independent, and the simplest procedure is then to draw samples 
from every strip, and if there is a shortage of material to use if necessary every plant in the 
delimited area as a centre. 


8 





Frequency 
-~ 
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3-6. There is an alternative derivation of the elementary result of §3-2 which has a 
bearing on a more fundamental issue. 

Imagine the area to be divided into a large number of small cells each of which is poten- 
tially capable of supporting one plant of a given species, and suppose that the chances that 
circumstances will actually permit a cell to support one such plant are extremely small. 
If the fates of the cells are independent and if in a quadrat the number of such cells is large 
and the conditions are similar for all quadrats, the number of plants per quadrat will vary 
in accordance with the Poisson law. 

Observe that this conclusion follows whatever the pattern of cells might be—no matter 
how regular their arrangement. 

The principle which emerges here is that if the elements of a uniformly dense population 
are exposed independently to a serious risk of extermination, the resulting census dis- 
tribution will be Poissonian for large quadrats, whatever the original pattern may have been. 

This phenomenon is well illustrated by the sedge Carex arenaria L. on sand-dunes. In 
the early stages of colonization, its rhizomes grow out in almost straight lines, and aerial 
shoots arise at almost constant intervals. The pattern is essentially that given by a number 
of intersecting ‘dotted’ lines of finite length (see Weaver & Clements, 1938, Fig. 78). At 
a later stage, a marked competition develops, and other species invade the area. Vegetative 
reproduction comes to an end, and the aerial shoots one by one disappear. For a time the 
original pattern may still be discerned, though not indefinitely, and towards the end the 
few survivors seem scattered and to all appearances at random. 

There can be little doubt that this process plays a not unimportant part in maintaining 
that disorderliness which is such a striking feature of the plant carpet. If @,(z) is the initial 
distribution and G,(z) the new, and p is the probability of individual survival 


G,(z) = Go( pz +4), 
®,(u) = (pu), (4) 
and the new Kk, = p” x old ky). 
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Accompanying this reduction of the cumulants is a change in the form of the distribution, 
which becomes markedly J-shaped. In practice it is almost, impossible to distinguish between 
samples from such a distribution and those arising from a Poisson distribution with the 
same low mean. 


3-7. Some insight into the interpretation of the coefficient, ki)/Kq) = 2/4, — 1 is gained by 
considering the effect of clustering. Suppose for illustration that the randomly distributed 
seedlings of some particular species give rise by vegetative multiplication to clusters of 
upright shoots, and for the purpose of the present argument let us suppose that the clusters 
are so compact in relation to the size of the quadrat that only a negligible proportion are 
cut through by its boundary. If g(z) is the p.g.f. of the distribution of the number of shoots 
in a cluster, the p.g.f. of the number of shoots per quadrat will be 


exp {A(g(z) — 1)}. 
The f.c.g.f. is then A{My Ut + Mg u?/2!+...}, 


where /,, refer to the distribution of shoots per cluster. The coefficient c, = K(q/k( for the 
compound distribution equals /u.)/4,) and is independent of A, the average number of 
clusters per quadrat. The only information it conveys is about the clusters themselves. 

Clearly c, = 0 if and only if g(z) is linear, that is, if the clusters do not contain more 
than one individual. 


3:8. In the majority of species of plant and animal the number of potential offspring 
per parent is quite large and the chances of individual survival very small. Provided that 
the offspring develop independently of one another and that the expected number per family 
does not fluctuate widely among families, it is reasonable to assume that the actual number 
of offspring in a family has a Poisson distribution. Even when the premises do not strictly 
hold, this is a very convenient simplifying assumption, which I believe to be justified at 
this early stage of development of analytical biology. 

3-9. The simplest compound distribution of the type considered in § 3-7 is in my opinion 
the Neyman Type A. It arises by putting g(z) = e”°-», a step which can be justified in the 
circumstances such as those discussed in § 3-8. Its p.g.f. is then 


exp {A(e™@-) — 1)} (5) 
and Ky) = Am’. 


The distribution has been applied to plant quadrat work (Archibald, 1948; Barnes & 
Stanbury, 1951), and to the quadrat sampling of insect larvae (Beall, 1940), where the larvae 
arise from batches of eggs laid at random and are presumed not to spread very far from 
their starting point. 


3-10. In the model we have just considered, the parent plants disappear on giving rise 
to the next generation. If, however, they were to persist it would be necessary to write 
g(z) = ze™=-) as the g.f. of the number of individuals in a cluster. 

We then obtain the compound distribution 


G(z) = exp {A(ze™*-Y — 1)}, (6) 


derived otherwise by Thomas (1949). The necessary conditions might well hold in the second 
year of colonization of open ground (Barnes & Stanbury, 1951). The distribution is not 
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genuinely applicable if the clusters are not compact, or if the process extends to later 
generations, so that its usefulness is limited. The distribution is sometimes unfortunately 
called the double Poisson, a term which could more appropriately be given to Neyman’s 
Type A distribution, and has been regarded by Archibald (1950) as the simplest of the 
contagious distributions. One reason why Thomas’s distribution sometimes gives slightly 
better graduations than the ordinary Neyman Type A is, I think, because in the latter 
g(z) = e™@-» fails to reflect the effects of the competition which so often occurs within a 
compact aggregate of individuals with similar requirements. Now in a case of this sort 
it can be shown that the classical binomial distribution is more appropriate than the 


Poisson. We then have G(z) = exp {A[( pz +9)" — 1}. (7) 
The graduation of Miss Archibald’s data on Carex flacca Schreb. by this function (with 


n = 3; p = 0295416; A = 1-59323) is shown in Table 1, col. iv, and compared with the ex- 
cellent graduation (col. iii) already given by Miss Archibald using the Thomas series. 


Table 1 
Plants Observed 

per quadrat frequency (iii) (iv) 
0 181 174-31 177-45 
1 118 130-72 124-38 
2 97 92-14 95-75 
3 54 53-31 54-03 
4 32 27-01 27-37 
5 9 12-50 12°55 
6 5 3°45 5-22 
7 0f3 10{ 216 eo 21 
8 1 


3-11. If the number of individuals in a cluster has the Pascal or negative binomial dis- 
tribution g(z) = (1—7)*(1—7z)-*, a very reasonable possibility, we obtain as the g.f. of 
the number of individuals per quadrat the expression 


G(z) = exp A (=) - 1]| P (8) 


I propose to call this the generalized Polya-Aeppli distribution. The special case where 
k = 1, so that g(z) is geometric, is given in Polya (1930) in another connexion, and its pro- 
perties are mentioned by Anscombe (1950). This case, with g(z) geometric, applies as a close 
approximation where the clusters have arisen by branching stochastic processes analogous 
to those considered by D. G. Kendall (1948) and others in continuous time. 





3:12. Ifin the previous example k is very small, 


g(z)—-1 =) 
k =log (;=). 


The g.f. of the distribution is then the Pascal distribution 





G(z) = (1—7)*4 (1 —72)-*. (9) 


This result is clearly equivalent to Quenouille’s theorem (1949), originally given in con- 
nexion with the distribution of the number of bacteria per Petri dish on the assumption 
that the number of colonies had a Poisson distribution and the number of bacteria per 
colony a logarithmic one. 
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3-13. As an illustration of the dangers to be faced in interpreting the parameters of a 
frequency distribution in terms of the biological situation, it may be noted that 


exp {A[g(z) — 1} = exp {A[y(z) — 1]} 


_atg2) h 

if y(z) = oi and A=A(a+1), 

where a is arbitrary. Hence, even if this kind of model is appropriate, the frequency data 
tell us nothing of the term independent of z in g(z). It is advisable to fix this arbitrarily as 
zero unless other information is available, and to interpret A as the density of actual rather 
than potential clusters. 


3-14a. Unlike the previous examples, we now consider the case where the aggregates 
are not compact and the quadrat small in comparison. 

Consider a large two-dimensional area (—n<x<n; —n<y<n) and a number (4n*c/7) 
of centres of aggregation (x;,y;) scattered at random through the region. Thus c is the 
average number of centres in a circle of unit radius. At first for simplicity it is supposed that 
all clusters contain M potential individuals, and the co-ordinates of the individuals belonging 
to the jth cluster are distributed as a random sample from the bivariate normal population 


dP = n exp{—(x—2,)?—(y—y;)*}dzdy. 


This is the familiar distribution of shots round a bull’s eye. 

In this way the unit of distance is taken as the root-mean-square deviation from the 
centre of the population. For the present we place a quadrat at the origin. Its size is mh?, 
where / is much less than unity. Later we shall allow n to tend to infinity, so that it will be 
immaterial whether the quadrat is at the origin or not. Since the quadrat is quite small the 
probability distribution of the number of individuals falling into it from the jth cluster will 


be given b 
. : g(z) = exp {A,(z—1)}, 
where A; = qgexp{—aj—yj} and q = Mh, 


v being the probability of individual survival. The contributions from all clusters may be 
compounded by multiplying their generating functions. The resultant distribution is then 


G(z) = I exp {A,(z— 1)}. 


The distribution so far is based on a particular disposition of the centres (x;,y;). We shall 
require G,,(z) the mean value of G(z) for all possible arrangements of the centres. Since 
G(z) is the product of independent distributions it follows that 


G,,(z) ea [P{erse-D} ]Anteln ‘a [i + &{erfe-) — 1}}**"*, 


The f.m.g.f. of this distribution is 


1 n n a 4n2%c/n 
®,(u) = [1 taal! ‘ (eva exP{—z*—y*} _ 1) dedy | 


By the polar transformation x = r cos, y = rsina, the double integral may be written 


2n nO : 
i da | (enaexPt—r"} _ 1) rdr, 
0 0 
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where 1 <@<.,/2 for the integrand does not change sign. It follows that 


O(u)= lim ®,(u) = exp an (eva exP—r} _ ]) 2rdr| ; 
n>o ~ 0 


Hence Y(u) = [ret De = eS gw (5 .j!). (10) 


The coefficient of u’/j! is the jth factorial cumulant 


Ky = ¢q'|j!. 


The following relations may then be deduced: 


Kq = cq, as is otherwise obvious, (11) 
K(y/Kay = 44, (12) 
The criterion Ki) Ky /Ke) = §- 


3-146. It should be noted that c, = k,/k,) increases with the number in a cluster, as in 
the distributions referred to in §§ 3-7—3-12. The present case is different, however, in that 
C, necessarily increases with quadrat size—a phenomenon which is well known to statistical 
ecologists. The following data on Plantago major L. were obtained in the same locality 
using two quadrat sizes, one twice the other. 


Distribution Distribution Ratio (i)/(ii) 
(i) (ii) theory 
No. of observations 400 200 
Kia)/Ki 1-32 2-40 1:2 
Kig)/Ki2) 2-00 4-06 1:2 
Ki) Kig)/Ka 1-52 1-69 (i) = (ii) = 4/3 


From the consistency shown in the above comparison a measure of support can be drawn 
for the hypothesis that the model discussed here is a reasonably adequate representation of 
the actual spatial pattern, and to my mind this support is of a kind different from but 
supplementary to that afforded by the fitting of frequency data alone (see Table 2, Gradua- 
tion I). 

3-14c. As in most compound distributions there is no simple formula for the general 
term of G(z). In practice, however, we can obtain actual numerical graduations simply by 
expanding the g.f. in a systematic way. We require p, the coefficient of 2” in exp {f(z — 1)}. 


Since 
a : (e@-Dt_}) dt 
Oz 0 t 


the coefficient of 2” in ¥(z—1) is a, = cI’,(r)/rT(r). 





q 
-{ t—letdt=T(r) (r>9), 
z=0 0 


In the notation employed in Pearson’s actual table of the Incomplete Gamma Function, 


Alternatively a, = : tt gi [j!. 


From this it is clear that a, tend rapidly to zero as r increases. 





10) 
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We may now write dai mee. 
r=1 
where a4=->da,, since G(1)=1. 
r=1 

G(z) is obtained as a power series by systematically multiplying exponential series. The 
computation is not formidable unless the variate extends to 15 or more. The simplest way 
of estimating the parameters is to substitute the sample values of K,,) and x in relations 
(12) and (13) of §3-14a. If gq is small, there is good reason to think the procedure efficient for 
the estimation of the product cq for the distribution then approximates to the Poisson form. 

In the numerical example which follows the estimates by moments are q = 2-63518 and 
c = 0-323507. Even so, the fit is quite good (Graduation I). The data relate to the distribu- 
tion of the number of plants of Plantago major present in quadrats of area 100sq.cm. laid 
down in grassland. 


Table 2 
Plants Observed 
per quadrat frequency Graduation I Graduation II Graduation III Graduation IV 
0 235 240-91 235-37 232-05 234-00 
1 81 72-35 80-13 85-36 80-23 
2 43 39-67 40-74 39-97 44-00 
3 18 22-48 20-95 20-05 19-41 
4 9 12-22 10-86 10-40 10-27 
5 6 6-33 5-67 5-50 5-46 
6 a 3°15 2-97 2-94 2-92 
7 (3 (1-52 1-56 1-59 1-58 
8 — 0-72 0-87 0-86 
9 1 0-33 0-48 0-48 
10 4 — 2°85 0-15 3°3 3-73 0-26 3°71; 0-26 
11 — 0-07 
12 — 0-04 
13 — 0-02 
400 


3°15. It is instructive to consider the distribution of the difference between the readings 
of two quadrats taken from the model studied in the previous paragraph. 

(a) If the quadrats are laid down independently of one another, the difference between 
their readings has p.g.f. G(z) G(z—), so that the cumulant generating function is 


log G(e') + log G(e-*). 


The odd order cumulants vanish. and those of even order are twice those of the distribution 
of a single reading. In particular, 


Ky = 2cq(1+ 49), 
ky = 2og(l +4 + 298+ 1). 


(6) If a small quadrat is divided into two equal parts, pairs of readings can be taken 
simultaneously and compared. As in § 3-14a, consider the quadrats placed at the origin and 
the contributions made to them by the jth aggregate. The distribution of the difference 
between the two values has p.g.f. 


y(z) = exp layte— 1) +A; C- i)| ; 


where A; = gexp{—a}—y?!. 
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Proceeding exactly as in §3-14a we find, after compounding the contributions from all 
aggregates and taking the mean value for all possible positions of the centres of aggregation, 
and proceeding to the limit, that the resulting generating function is 


G(z) = exp an (e"e*) - 1) a ‘ 


q 
The cumulant g.f. is then Y(t) = e| (e2u(cosht—1) _ y=, 
0 


giving Kon41 = 9, 
Ko = 2cq, 
Ky = 2eq(1 +39), 
Kg = 2cq(1+15q+ 209). 


If aggregation is negligible so that q is small, the distribution degenerates into the difference 
between two independent Poisson variates (Irwin, 1937; Skellam, 1946). 

(c) If the twin quadrats have their centres 2s units apart with the origin midway between 
them, the g.f. of the difference between their readings due solely to the jth aggregate is 


G2) = exp{Agle—1)+Ax(--1)], } 


where Ag = qexp{—2x}— yj —s? + 28(x; cosa + y,sin «)}, 
Ap = qexp{—2x}—y}—s* — 2s(x; cosa +y,sin)}, 


and a = the angle between the X-axis and the line joining the centres of the twin quadrats. 
Continuing exactly as before we obtain the cumulant generating function 


2n7 fo 
(t) = ¢ (ex q(e— 1) e-??-#*+20r COBY + g(e— — 1) e~7*—8*—er cos y ad 1) rdrd . 
“| Ap p Y 


By picking out the coefficient of 44? it will be seen that 
c 2n (fo : f 
Ky = =| { (q e-7?—#? + 2sr COS y af - e ~2r?—28°/ pdr cosy _ 1}) 2rdrdy. 
0 0 


: l 2n « oe 
Since .™ emcosy dy = I,(2b) = > b%/ (51), 
27 Jo j=0 


where /,(x) is the modified Bessel function of the first kind, and since 
irs) «o 2; @o j 
| e~® ],(2sR')dR = > a Pm a dR =e”, 
0 j=0)! 0 j! 


it follows that Ky = 2cq+cq?(1 —e-*”), (14) 


When s = 0, this result degenerates into that for adjacent quadrats, and, when s—>0o, 
becomes that for independent quadrats. 

Using twin quadrats with centres 10cm. apart, the distribution of the difference between 
their readings was 


Difference —5 -—4 -—3 —2 -—1 0 1 2 3 4 5 6 Total 
Frequency 1 2 5 ll 30 102 28 9 7 2 2 l 200 
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ms = 2-105. The values of c and q have already been given as c = 0-3235 and q = 2-635. If 
these values are substituted into (14) it is found that s = 0-3 approx., corresponding to 
a root-mean-square dispersion of about 17cm. 

Results of this kind are helpful in judging the adequacy of the basic model, for estimates 
of some particular aspect of the physical situation can always be compared with the direct 
evidence afforded to our senses; whilst sets of estimates obtained by varying the choice of 
quadrat size or distance between quadrats should be reasonably consistent among them- 
selves. 


3-16. From the theoretical standpoint, the distributions we have so far considered are 
rarely applicable to patterns which have arisen by processes operating over a large number 
of generations. We shall now suppose that a quadrat contains a number of annual plants, 
that one such plant will on the average give rise to 9 offspring within the quadrat and an 
unspecified number outside, and that an average of ¢ plants per quadrat originate from 
parents outside that quadrat. The reasonable simplifying assumption that the distribution 
of offspring is Poissonian leads to the equation 


G(z) = e@-) G(er=-D) (15) 


as the condition for a stochastic equilibrium. This equation has been discussed by Haldane 
(1949) and Skellam (1948) in connexion with an analogous problem in evolutionary genetics. 
The equation may be written in the form 


Y(u) = ew+ ¥ (1m + $97u? + nus .) : 
whence Ky = e/(1-»), 
Kia) = €9?/(1—9) (1-9), 
Kg) = €99(1 + 29?)/(1 — 9) (1 — 9?) (1— 7°), 
Kay Kia)/ Ke = (1+?) (1+9)/[9l+9+9%)]=2 if g=1. 


The Pascal distribution is known to provide excellent approximations (Skellam, 1948) 
and to be exact for the corresponding problem in continuous time (D. G. Kendall, 1948). 


3-17. If a discrete distribut*‘-n (x = 0,1,2,...) can be represented approximately by 
f{*(x), then a closer approximation can be found in terms of f*(x) and its receding differences. 
For if f*(x) = 0 for x < 0 it is easily seen that 


ES (1+ uy Vf*(2) = —u ¥ (l+uyef*(e), 
s=0 «c=0 
and by repetition that 
2 (1 +u)*(—V)f*(x) =u" ¥ (1+u)*f*(z). 
z= r=0 


If now L(u) is a polynomial it follows that f(x) = L(—V)f*(x) has f.m.g.f. 


D(u) = L(u) O*(u), (16a) 
where ©* refers to f*. 
If L(u) = La,u'/r! = exp Xb,u’/r! 
it follows that b, = Ky — Koy: (166) 


The well-known relations expressing a, in terms of 6, are greatly simplified if the para- 
meters of f* are chosen so that the earlier xj, = K;,). 
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Owing to the very considerable sampling error of the higher x, it is not advisable to 
estimate the coefficients of L by substituting sample values in (166). 

A more satisfactory procedure is to select a really suitable ‘kernel’ and to improve it by 
only one or two correction terms. If f(x) is the graduation at any stage, H(x) the error and 
aC(x) the next correction termi to be applied, then the coefficient a may be determined by 


the formula a = XQE/ZQC, 


where Q = C/f, much in accordance with the principle of minimum ?. 
In Table 2, col. iv, is shown a graduation given by the Pascal distribution, 


f*(a) = (1-O)F k(k +1)... (e+2—1) 62/21, 


with 0 = 0-568517 and k = 0-647015. Both are estimated by means of the first two factorial 
cumulants of the observed distribution. 
Col. v gives the graduation based on 


f(x) = f*(x) + 0-0084V3f*(x). 


Despite the remarkably close fitting (or over-fitting) achieved by series developments of 
this kind, I do not feel that such graduations contribute directly to the understanding of 
the biological issues involved. 


3-18. Ifa census distribution has a Gram-Charlier development 


‘ 


f(x) = (1 + ¥ a,( -vy/r') eA Az /x! 
r=2 
and if the individual organisms are subjected to a process of random selection as in § 3-6, 
it will be seen by writing pu for u in O(u), expressed as in (16a), that the new probability 
function ‘ 
fy(e) = (14 3 a,(—pVy ir!) «> Appia. (17) 
r=2 


If p becomes small it is clear that the absolute effect of the operator becomes negligible. 
It is of interest, however, that the values in the tail of the distribution though vanishingly 
small still remain abnormally disproportionate relative to those of the Poisson kernel, for 
the effect of the operation {p"(—V)"} (Ap)#/2! relative to (Ap)*/x! as p> 0 tends to 2/A’, 
and this vanishes only when 2 <r. 

This effect is illustrated numerically in Table 3. The initial p.f. was f)(~) = (1+ $V?) e-}/z!. 
In col. ii is shown the distribution resulting from random selection with p = ;; calculated 
by formula (17). The Poisson kernel is given in col. iii. The ratios between the corresponding 
values of the distribution and the kernel are given in col. iv, and the limiting values of those 
ratios in col. v. 

In a sense, the vestigial remains of former non-randomness seem to be concentrated in the 
tail, where unfortunately in practice there are usually insufficient observations to permit 
the reconstruction of the archetype. 


Table 3 
(i) (ii) (iii) (iv) (v) 
0 0-906647 0-904837 1-00 1-00 
l 0-087046 0-090484 0-96 1-00 
2 0-005981 0-004524 1-32 1-40 
3 0-000314 0-000151 2-08 2-20 
4 0-000012 0-000004 3-2 3-40 
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3-19. The species which are present in a region have varying abilities to tolerate the 
particular sets of conditions which exist locally. The slightest change in the vital coefficient 
in a dynamical system of interacting and competing species can have profound effects on 
the abundance and survival of the various components (Volterra, 1931; Skellam, 1951). 

As a result, minute environmental differences are often reflected in a striking way in the 
composition of the flora and fauna. Even in small areas differences in the composition of 
the soil commonly occur sufficient to produce an irregular and perhaps ill-defined patch- 
work in the structure of the associated plant carpet. 

This patchiness is particularly well marked in an unstable vegetation system, as, for 
example, where grassland gives way to heather moor, for the succession from one phase to 
the next does not proceed at the same rate everywhere. Stable climax vegetation has a 
more uniform texture, but even here we find a mosaic of variability determined by the 
disposition of the more dominant organisms. 

Such heterogeneity is itself largely the outcome of apparently random events. Patches 
of shade plants may occur where a tree happens to have established itself, or nitrophilous 
species where animal excreta has enriched the soil. Irregular channels are eroded in uniform 
surfaces by the action of water and the sun cracks of drying mud have their own peculiar 
flora and fauna (see Weaver & Clements, 1938, Fig. 79). 


3-20. In previous theorems we have considered a number of distributions arising from 
quadrat sampling under uniform conditions. Let G(z,@) denote such a distribution. If now 
conditions vary in different parts of the region being sampled, so that the parameter @ has 
a distribution of its own, the resulting distribution will be 


G(z) = { G(z, A) dF(8@), 
and similar expressions follow for ¢ and ®. 
The simplest case is that of the Poisson distribution with variable A 


P(u) = i} ed F(A). 
0 
Hence the factorial moments of the distribution are the power moments of the distribution 
of A. Unbiased estimates of the moments of the unknown distribution of A are therefore 
given by the sample values of the factorial moments of the census distribution. 
(a) In the familiar case studied by Greenwood & Yule (1920) A is a gamma variate. Hence 


@(u) = | * eA @-PA Ak-Lyk A/T (ke) 


0 


(9 9 


ll 


the f.m.g.f. of the Pascal distribution. 

(6) If A = av, a being constant, then ®(u) = ¢(au), where ¢ is the m.g.f. of the distribu- 
tion of v. 

In particular, if v is approximately equivalent to a Poisson variate so that 


A(t) = exp {m(e!— 1)}, 
we find that ®(u) = exp {m(e — 1)}, the f.m.g.f. of Neyman’s distribution of Type A. 
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(c) When the basic distribution is the Pascal 
[f.m.g.f. = (l—yu)-*] 


in which the parameter k = av varies as in the previous example, we obtain 
O(u) = DS (l—-yu)-*e-™ m/v! 
v=0 


= exp {m[(1—yu)-*— 1]}}, (19) 
which is the f.m.g.f. of a generalized Polya-Aeppli distribution (3-11) with 7 = y/(1+/). 


3-2la. As an illustration of the effect of heterogeneity on distributions arising from the 
use of small quadrats let us generalize the model of § 3-14 by taking c (the expected number 
of clusters in a unit circle) as a gamma variate 


dP = @e-"cr-1dce/T(w) (O<c<o), 
with w/6 = é the mean value of c. For fixed c the f.m.g.f. is e°@, where Q(u,g) = ¥ w’q’/(j!J). 
j=1 
The resulting generalization is then 
®(u) = o» | e~cl0—-2) ee—1 de/T'(w) 
0 
= (1-@Q/w)-”. (20) 


The factorial cumulants are Kj) = éq, 


ae ee 
Ky = Cg s+): 


a(t, 2 2 
Ki) = 3t aot oF , 
and K()Kig)/Ki) = (40? + 186 + 24)/(30? + 126 + 12) 


takes values in the range $ to 2. At one extreme the distribution is identical with that of 
§ 3-14, and at the other with the Pascal distribution. By reason of the expansion of Q(z — 1,¢) 
already considered (§ 3-14c), the p.g.f. of the distribution may be written in the form 


Gz) = |—__>*— }, (21) 


where the coefficients are obtained either by employing tables of the Incomplete Gamma 


Function or alternatively by nae | 
0, = 5, [(- S5)), 
Cc j=i 


1 @ gk 
where s=- > ¢ 
'k=r 


= 


3°216. Graduated values may be obtained if required by the systematic expansion of 
G(z). Since the 6’s are small the powers of 6 soon vanish, and of course no coefficient of z7 in 
x6 ,2’ is needed for x > maximum variate value required. 
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There is a useful independent check on the probability that x = 0, for 


- a é ar —w 
G(0) = E “ Q( 1.0) | 
a dt t 
and (= 1,9) = [et 1) F = -(C +logq—Ei( - 9), (22) 
0 


where C is Euler’s constant and — Ei(-—z) is the exponential integral [ edt/t which has 
z 


been tabulated. 

Rough estimates of the parameters are readily obtained, using the sample valuesof the first 
three factorial cumulants as population cumulants. In practice I have found it preferable 
to modify them so as to satisfy the first two factorial cumulants and the observed frequency 
at x = 0, for if quadrat size is small (as it has been assumed to be) this class is by far the 
greatest and its weight considerable. 

For any given trial value of g we can find in succession 


(i) €=Ky/q, (ii) E/o = Key/ (qk) — 4, 
(iii) w from (i) and (ii), (iv) Q(-—1,q) by 22, (v) G(0). 


As an illustration, the following trials were made during the graduation of the Plantago 
data given earlier: 


By moments Ist trial 2nd trial 3rd trial 
q 1-5717 1-5 1-0 0-6 
NG(0) 238-9 238-7 237-1 235-37 


The fitted values with g = 0-6 are shown as Graduation II in Table 2. 


3-22. In the case of contagious distributions an additional complication arises when the 
individuals of the same species differ in their reproductive potential. When these differences 
are an expression of genetic variability within the species they are displayed under constant 
environmental conditions. 

As an illustration consider the effect of such variation on the model of §3-9, where the 
resulting distribution is Neyman Type A. The distribution of the number of individuals 
per cluster will now be 


g(z) = [> exp tne —1)}dF(m). 


Under the hypothesis that m has a Gamma distribution (compare § 3-20), the distribution 
of the number of individuals per quadrat, which is G(z) = exp {A(g(z) — 1)}, will be seen to 
be the general Polya-Aeppli (§ 3-11). 

As a further example, suppose that in the model of §3-14a, g = Mh*v has distribution 
function F(q) and m.g.f. A(t). 

We then have 


(fo P2n fo 


: ofl dt 
¥(u) =e” | [exp fugt}— 1) Fara) 


1 dt oO mur 
ia e| [d(ut)- JF =e D &<. 
0 pu TP: 
where the x” refer to the distribution of q. 
Biometrika 39 24 
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In the special case where g has an exponential distribution, 


A(t) = xI(x-*), 


1 udt U 
yu’ > — ox “ait etapa 
rw) = ef = clog| 1 . 


and ®(u) = [ -*] ” the f.m.g.f of the resulting Pascal distribution. 





4. SUMMARY AND CONCLUSIONS 


1. A number of distributions arising in quadrat sampling are considered in relation to 
the underlying pattern of organisms. 

2. It is most noticeable that the same distribution may arise from several quite distinct 
models. 

3. Satisfactory graduations of frequency data are usually possible on a wide variety of 
alternative hypotheses. 

4. Whether a given model is appropriate must be determined in the light of additional 
evidence of a different kind. A few ways are briefly suggested as to how this problem might 
be approached. 
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INTRODUCTION 


The maximum-likelihood equations for estimating the death-rate among a population of 
wild animals from data obtained by means of the capture-recapture method have been 
given in a previous paper (Leslie & Chitty, 1951), and in the particular case studied the 
simplifying assumption was made that the death-rate per unit of time in the population as 
a whole remained constant over the period of sampling. The estimation of the death-rate 
is, however, only a preliminary step in the solution of the general problem, which also 
includes the estimation of the total numbers alive in the population at some given time. 
It is evident that the solution of this problem in any particular case will depend very 
greatly on our information regarding the population which was being sampled. Thus, for 
example, if we know that no dilution was occurring through the entry of new individuals 
into the population at risk of capture, then only two parameters need be determined when 
the death-rate is assumed to remain constant, namely, the number JN, alive at the initial 
sampling and the constant survival factor P per unit of time. These conditions might be 
fulfilled, for instance, when a population was being sampled during the non-breeding season. 
If, however, a preliminary analysis of the data shows that dilution was in fact occurring, or 
we have reason to assume this on general biological grounds, then it will be necessary to 
estimate the number JN, alive in the population at successive intervals of time. In another 
case we may suspect that the assumption of a constant death-rate is not justified, and here 
again the solution of the problem clearly depends on whether we were dealing with a popula- 
tion which was decreasing merely through deaths, or whether in addition it was being 
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recruited by births and immigrants. Since the precision of any estimates which are made 
from a given set of data depends very greatly on the choice of the model appropriate to the 
particular population sampled, the estimation of total numbers is discussed in this paper 
under several separate headings, which comprise most of the cases commonly arising in the 
analysis of field data. It is assumed throughout that the sampling of the population was 
satisfactory, namely, that it was entirely at random; that the method of marking the 
animals did not affect their subsequent chances of recapture; and that marked and unmarked 
animals were recaptured with equal facility, or, in other words, that there was no differential 
trap-shyness or preference between the two groups. 

In order to illustrate some of these different methods of estimation on an actual numerical 
example, the data from the same sampling experiment will be used as in the previous paper, 
where a full description, together with the detailed results, will be found (Leslie & Chitty, 
1951, §6 and Appendix III). Briefly, an artificial population which consisted initially of 
300 numbered counters decreased owing to the operation of a constant ‘death-rate’, and 
five samples, each of fifty counters, were taken at equidistant intervals of time. The whole 
chain of five samplings was repeated, from the beginning, twenty times. Thus, in the case 
of some particular parameter, we can compare both the observed mean of the twenty 
independent estimates with the known, true value, and also the inter-replicate variance, 
based on nineteen degrees of freedom, with the theoretical variance obtained by inserting 
the expected numerical values in the appropriate maximum-likelihood equations. 

Since it will frequently be necessary in what follows to cross-refer to various sections of 
the earlier paper (Leslie & Chitty, 1951), in future such references will be given for con- 
venience in the form (I, §z). 


1. THE ESTIMATION OF TOTAL NUMBERS WHEN THE DEATH-RATE IS ASSUMED TO REMAIN 
CONSTANT OVER THE PERIOD OF SAMPLING, AND DILUTION OF THE POPULATION IS 
OCCURRING 


The first case to be considered is the general one in which between any two successive 
samples new individuals, in varying numbers, may have entered the population at risk of 
capture. If this population consists of a total of N, individuals at time t, then, adopting 
a deterministic model, PN, of these will be alive at time t+ 1, assuming that the death-rate* 
per unit of time, Q = 1—P, remains constant over the period of sampling, and that the 
samples are taken at equidistant intervals. Let 


B,N, = the number of new individuals entering the population during the interval ¢ to 
t+1, which are alive at time ¢+ 1. 


This group of new individuals surviving at t+ 1 may consist of immigrants, of young born 
during the interval, or of young which have grown up to enter the population at risk of 
capture. Much depends on whether we are sampling over all possible age groups, or only 
a selection of these, such as the adult age classes. The parameter B, may be termed the 
dilution factor and is not to be confused with the ordinary birth-rate. The change in 
population numbers over the interval ¢ to ¢+ 1 is thus given by 


Niu = (P+ B)N =A, say; 


* For brevity the expression ‘death-rate’ is used, both here and throughout the remainder of this 


paper, as a comprehensive term which would also include any emigration of individuals from the 
population. 
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and we require estimates of N,, B, and P, together with their standard errors. The solution 
of this problem depends essentially on the estimation of N,,N,,, and P. Once these are 
determined we can arrive at an estimate of A, and finally B,. 

We will suppose that a population consisting of a variable number (N,) of individuals is 
sampled at equidistant intervals of time ¢ = 0,1,...,7, and that at each sampling R, 
individuals are captured, marked and returned to the population. (It is assumed in the 
theoretical development that no animals are either accidentally killed by this procedure, 
or removed permanently from the population.) Each sample of R, consists of u, unmarked 
individuals and s, recaptures bearing one or more marks, and throughout the present 
discussion it will be assumed that the recaptures at time ¢ are grouped according to the 
interval of time since they were last captured. This is the system of grouping termed 
Method B in the previous paper, and it has been shown that no information is lost by 
grouping the individuals in this way (I, §4). Thus, among the s, recaptured at time ¢, let 
m,, be the number which were last captured at time x(x = 0,1, 2,...,(#-—1); } m,, = 4). 


Adopting a purely deterministic model of the chain of events, in which the constant death- 
rate per unit of time, Q = 1 — P, is assumed to fall equally on all subclasses in the population, 
the expected total number of marked individuals in the population as a whole at time t is 
a polynomial function of the parameter P, 


$(P) = P'Ry+ Pu, + Pug +... + Puy. 


When the recaptures are grouped according to the interval of time since they were last 
captured, the expected number in each class in the population as a whole is also a polynomial 
function of P, say f,,(P), with ¥f,(P) = $,(P) (I, §4). Writing down the expected number 


in each class in the population at time t, together with the number observed in the sample of 


R, taken at that time, we have 
M-P(P) %& 





Sol?) Moy 

fulP) my 
fi-adP) mean 
Total WN, R, 


Since, in practice, the size of each sample is likely to be relatively small compared with the 
population at risk, we may assume that the probability of obtaining the observed result at 
each sampling is given by the appropriate term of multinomial distribution. Then, summing 
over all values of ¢t from 1 to 7’, the log likelihood of the combined series of results is 


L= ~ u, log (N,— ¢,) + ~ xX m,, log (f4) — = R,log N,, 


from which we have 


a G18, P)] 


aN NHN 
eh __y US yyy Malet _ 9 | 
cP tN-h& Tr fat 


where ¢,=dd/dP and fi, =df,/dP. 


(1-1) 











366 The estimation of population parameters 


From the first of these equations, 


N, = Rly (t= 1,2,...,7). (1-2) 
Then, substituting this value in the second equation, 
OL 3h; Mar S xt 
ap=- + “= F(P)=0, 1-3 
oP = ~ 2G, d, a ar fi (P) (1-3) 


which is the equation for estimating P when only the recaptures at times ¢ = 1, 2,..., 7’ are 
considered (I, §5, Method B). Thus, the problem of the simultaneous estimation of the 
Nand P reduces to that of first estimating P from the observed recaptures and then of using 
this estimate to determine the N, from (1-2). 

The information matrix obtained from the set of equations (1-1) is of the form 











fa, 9 0 4p] 
0 dye 0 Ui Gag 
0 : 
M= . ° ae ee oc : “a. € 
0 
. - © Gey Sue 
L Ap 4} appl 
eL R, 
where dy = ane = NX lit; 5) - 
0) 1’), 
ail eL Rig, _ 
~. a oP MN, ¢,) 
eL 
and app = ~ep2" 


Inverting this matrix, we have for the asymptotic variance of P, 


T af 
> Ftp 


1 
V(P) i App— ay : 
which may be shown to be the same as the result obtained by differentiating (1-3) with 
regard to P, namely, —0F(P)/oP = 1/V(P). The variance of the estimate P is thus obtained 


from the iterative solution of (1-3) as illustrated in Leslie & Chitty (1951). For the asymp- 
totic variance of N,, we have 


1 . 2 
V(N) = 2+ (“) V(P). 


Thus, putting nm, = $/N, 
vay = mp St (8) ve. 
where P, = P*Ry + Pu, + PRtugt+... + Puy, 


and $; = tP*"'Ry + (t— 1) P®u, + (t— 2) Pugt ... +u,_,. 
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In practice, the estimated values N a P and V(P) are inserted in this expression for V(N)), 
and the observed proportion of marked animals in the sample at time t, namely, p, = 8,/R,. 
The various covariances, if required, are given by 
cov (N, P) = NS'V(P 
t 
cov (N,N;) = NN, S&P). 


In applying these equations, however, there is one point which should be mentioned in 
passing. It will be noticed that according to (1-2) an estimate may be made of N,, the 
number alive in the population at the time of the last sample in the chain. Now, it may be 
shown, by supposing that P changes to P’ during the interval T' — 1 to 7’, that the estimate 
of P which is made from the distributions of recaptures must refer to the span of time 
between the original and penultimate sampling, and that no estimate can be made of the 
survival factor during the last interval. It follows, therefore, that strictly speaking the 
value of Np also cannot be determined. In calculating N, from (1-2) we are extrapolating 
the value of P over the interval 7'— 1 to 7, and thus the use of this estimate together with 
its variance must always be attended by a certain degree of uncertainty. 

A more important point, however, is the question of whether the estimates made by means 
of N, = $,Ri/s, 
are biased and, if so, whether there is any unbiased estimator which can be used in its place. 
The same problem has been recently discussed by Chapman (1951) and Bailey (1951) in the 
case of the so-called ‘Lincoln Index’, or sample census. This type of census, which in its 
simplest form is a special case of the more general type discussed here, can be employed for 
a population in which no dilution is occurring through births and immigrants. Thus, under 
these conditions, if Ry individuals out of a total N, are captured, marked and returned to 
the population, and at the first resampling s, marked individuals are found in a sample of 
R,, then the maximum-likelihood estimate of N, ie given by 

N, = Ry R,/s,, (1-4) 
which is of a similar form to the more general equation given above. Both Chapman and 
Bailey show that this estimate of N, is positively biased, so that in the long run we would 
tend to overestimate the number of individuals in the population. Arguing by analogy, 
therefore, we might expect the more general form also to be positively biased, and in fact 
this was observed to be the case in all the mean estimates of total numbers for the sampling 
experiment which were made, not only from (1-2), but also from estimators of a similar type 
arising later in this discussion. 

There is, of course, one way out of this difficulty both in the Lincoln Index and in the 
present problem, and that is to estimate the size of population in terms of the reciprocal 
X, = N;', since the observed proportion of marked individuals in the sample is an unbiased 
estimate of the proportion in the population. There are, however, certain disadvantages in 
calculating the size of population in these terms, more particularly when it is necessary to 
combine a number of independent estimates together in order to obtain, for instance, the 
total population at a given time. We require some relatively unbiased estimate of N, together 
with its standard error. Now, in the case of the Lincoln Index, Bailey (1951, §2) has 
suggested that, instead of the maximum likelihood estimate (1-4), we may usc 


A 


No = Ry(R, + 1)/(8, + 1). 











368 The estimation of population parameters 


He shows that the relative bias of this estimate is quite small, even for moderate E(s,), and 
that an approximately unbiased estimate of the variance is given by 


RRR, + 1) (Ry — 4) 


MN = Ce + 1) (6, +2) 





This form appears to be readily extendible to the more general case. Thus, in place of (1-2), 
we might use the adjusted estimate 


y, = $(R,+ 1)/(s,+ 1), (1-5) 
i ; Na ™, R,- 
with variance ViN,) = il oe - r ag* 5 +(S)'V (P)]. 


In this way we should obtain a more satisfactory estimate of the total numbers, provided 
that the ¢, are unbiased, and this in turn will presumably depend on whether there is any 
bias in the estimate P of the survival factor. Only empirical evidence can be offered here 
to show that estimates of ¢, are satisfactory in this respect, at least under conditions of 
sampling corresponding to those in the experiment. Thus, for example, the values of 
¢, (t = 1, 2,3, 4) were calculated for each of the twenty replicates, using the values of the 
survival factor P which have already been given (I, §8, Table 2) and the observed numbers 
of unmarked counters at each sampling. The following were the observed means of the 
twenty values of ¢, contrasted with those expected: 


t Expected Observed ¢, 
| 42-000 41-950 
2 70-386 70-486 
3 87-091 87-638 
4 94-681 95-750 


Similarly, for the three-point sampling (I, §8), when only the samples taken at ¢ = 0, 2 and 4 
were considered, and thus the estimates of the survival factor were subject to a much 
greater degree of error, the observed ¢, and ¢, were 34-210 and 55-901, compared with 
expected values of 35-334 and 54-483 respectively. Evidently, judging from these results, 
there is little evidence for any marked bias in the estimates of ¢, when sampling on this 
scale from a population of a few hundred individuals. 

The total number of counters in the urn was then calculated by means of (1-2) and (1-5), 
in order to compare the relative bias of these two estimates. Since, in this experiment, it 
was known that the survival factor had not changed appreciably during the last interval 
in the chain, the values of N, were also estimated for t = 4, when the last sample was taken. 
This extrapolation, however, might be attended by a certain amount of risk in the case of 
some unknown population, as has been pointed out earlier in this section. The following 
were the observed means of the twenty estimates for ¢ = 1, 2, 3,4, together with the 
expected number of marked counters E(s,): 








True Observed means Range of estimates 
value 4 >= - - S 
t E(s,) of N, R, N, Ny, N, 
! 8-3 252 298-9 259°] 157-922 149-627 
2 16-6 212 221-4 211-7 140-300 137-285 
3 24-5 178 182-9 178-8 133-299 131-289 
4 


31-6 150 155-4 153-4 100-195 99-193 
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It will be seen that the estimates V , are on the average positively biased, and that the 
degree of this bias becomes progressively less with increasing H(s,). The relatively high mean 
value for N, was mostly due to a very large estimate of N, = 922 for one replicate in which 
only two soruihionns were found at the first resampling. In comparison, the estimates ™, 
are relatively unbiased, and the use of this estimate in place of x, is therefore likely in the 
long run to lead to more satisfactory results, more particularly a there is only a.small, 
or a moderately small, number of recaptures. For large E(s,), of course, the two estimates 
are equivalent. 

The comparison between the expected and observed variances of these estimates raises 
a point which may be of some importance from the practical point of view. In the theoretical 
development it was assumed that the probability of obtaining the observed results at each 
sampling is given by the appropriate term of a multinomial distribution, whereas, strictly 
speaking, a multihypergeometric distribution should have been considered. Two sets of 
values for the theoretical variances are therefore given in the following table. The first of 
these (o%,) was obtained by inserting the known, expected values in the equations which 
have been given for V(N,), while in the second set (07) allowance was made for the fact that 
samples of a given size were being withdrawn from a finite population. Taking y, as the 
better of the two estimates of total numbers, the following were the observed variances 
between replicates, each based on 19 degrees of freedom. (The variances of the N, were all 
greater than these.) 


t 100R,/N, o?, o} s*(N) 
1 19-8 6793 5414 10133 
2 23-6 2512 1874 2421 
3 28-1 1562 1095 1219 
4 33-3 1289 880 963 


Considering the successive values of s*( ¥), it will be seen that the first of these is greater 
than o?,, though this may be due in part to-the rather extreme estimate of N, = 627. The 
subsequent values fall between o?,, and o%,, gradually approaching the latter as the expecta- 
tions E(s,), and the proportions of the population which were sampled (R,/N,), become 
greater. The reason why these observed variances are all greater than 07, is probably due 
to the fact that the actual procedure adopted in the experiment differed somewhat from the 
deterministic model on which the theoretical variances are based. The removal of a number 
of counters chosen at random between each sampling, to represent the number of deaths 
which were assumed to occur, introduced an additional random element, the general effect 
of which would be to increase the variability between replicates (I,§7; Moran, 1952). 
A similar effect might also be expected in sampling from a biological population, in which 
the death-rate must be regarded, strictly speaking, as a probability rather than a constant 
parameter falling equally on all subclasses in the population, as is assumed in the deter- 
ministic model. The practical conclusion suggested by the results of this experiment is that 
it would be better not to adjust any variances estimated by means of the equations given 
here, in order to allow for the finite nature of the population, unless the size of each sample 
forms a relatively high proportion of the total numbers. Thus, examples do occasionally 
arise in field work, in which it is evident from internal evidence that some 50-70 % of the 
population was being sampled each time. Here some adjustment seems to be called for, and 
the simplest way of doing this is to obtain a pooled estimate of the fraction (f) of the 
population sampled over the period of time 1 to J’ — 1, and multiply the estimated variances 
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of the N, by the quantity 1—f. This will probably be accurate enough for all ordinary 
purposes, provided that the individual fractions do not vary greatly. 

Having thus determined the N, for ¢ = 1, 2,3, ..., 7’— 1, and the constant survival factor P, 
we may then proceed to the estimation of A, = (P+ B,) fort = 1, 2,3,..., 77 —2, and hence 
of the dilution factors B,, if these parameters are required. Thus 

A = Nual/™, 
and for large samples, 


N? N, 
V(A,) = Wi V(N) +49 V(Ni41)-2 WP cov (N,, N,,1). 





Using the estimates Y,and i, 41, together with their variances and covariance, we have 
A, = NualN, 


and for the variances of this estimate, the above expression reduces to 


VA) = Ha ata + (Set—- 2) vy], 


Drs 
where x, = (R,—8,)/(R,+ 1) (8,+ 2). 
Finally, B, = p P, 


the variance of this estimate being 


V(B,) = V(A)- [2 (*s -$) - 1] V(P), 


dp; 2 A 
or alternatively, V(B,) = A}(x, +2, 41) + [A (s- ) ~ 1] V(P) 
141 


2. WHEN NO DILUTION Is OCCURRING, AND THE DEATH-RATE IS CHANGING 


Hitherto it has been assumed that the death-rate per unit of time in the population as 
a whole remains constant over the period of sampling. It is now of interest to relax this 
condition and to consider first of all the results of a chain of samples from a population in 
which no dilution is occurring through births and immigrants, and in which the death-rate 
may vary from interval to interval. This problem arises in practice, not only in the general 
case of sampling during the non-breeding season from a population consisting of an unknown 
number of individuals, but also when we wish to determine the number of survivors at 
successive intervals of time of a known number of individuals alive at a certain date. Since 
in the latter case we are considering only a particular group of animals, the dilution factors 
are necessarily zero. It is, however, convenient to develop the methods of estimation in 
terms of an unknown population from which samples of Ro, R,, ..., Rp individuals are taken 
at equidistant intervals of time. (This last restriction is not really necessary, but is imposed 
simply for convenience, so that we may adopt this interval as the unit of time in which the 
death-rates are expressed. If the sampling is at unequal intervals, we merely regard the 
successive samples as being taken at times fp, t,,t,,...,t, and label them accordingly by 
these suffixes.) 
Let N, = the total number of individuals in the population at some origin of time, 


and /P,= the survival factor over the interval of time ¢t to t+ 1. 
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The values of P, are now not necessarily the same for each interval, and for products of the 
form P,P, P,P, ... we will write Py,9,. Since it is here assumed that all the dilution factors 
B, = 0, the total number in the population at time ¢ is 


N, a Pas...c-vo- 


(a) Method A grouping 
We may first of all estimate Ny by a simple extension of the ordinary Lincoln Index. Thus, 
the sample of R, will consist of ky, individuals marked, and R,—kg, = ug, not marked, at 
t = 9. Considering only these two classes, we have in the population at time ¢ the following 
expected number of survivors from the tota] of Ry marked at the time of the first sampling, 
together with the number observed in the sample of R;: 
Pois...a-\No— Ro) Mor 
Pas...» Ro ko 
Por..uvN Rf, 
The combined probability of obtaining the series of results for the wu, and ko, in the samples 
taken over the period of time 1 to 7 is then easily seen to be 


T 
ot x (MER) (Bay 
N } \N 
and the essential part of the log likelihood is therefore 





> 


T T 
L= wy Uy log (Ny — Ry) — py R, log Np. 
oL Lu, XR, 


OL _ 1 [2Rtky |, 
ONS N2| Zu, _|’ 
a T T 
from which we have No = RR] Uo ky, 
1 1 
~ eL 
and V(N5) =-—]1 Ne" 


Just as in the case of the simple Lincoln Index discussed in the previous section, we might 
expect this estimate N, to be positively biased, and in its place, following Bailey (1951), we 
may use the adjusted estimate 


N,= R(ER+ 1) | (Skat i), 





, : %, %, Lu 
with variance, V(N,) = {ons 7 (Skat 5 . 

Consider now the survivors of the sample of R, marked at the first resampling from the 
population consisting of N, = P,N, individuals. There will be k,, recaptures of this class in 
the samples taken at ¢ = 2,3,..., 7, and thus R,—k,, = u,, individuals not marked at ¢ = 1. 
By a similar argument we have 


V,= R(E R,+ 1) | (Seu 1), 
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and hence, in general, the estimated number of individuals alive in the population at a 
time x which is previous to the sampling at time ¢ is 





° 1 T 
V,= R,( & R<1)/( >» ka+1) (x = 0,1,2,...,77—1), 
+l t=r+1 
: -* ‘ 1 (7 T 
with V( 1.) = Saf > wa] (= 4+1) > tat2)} |. 
t=r+1 z+1 t=r+1 } 


Since k,, is the number of individuals captured at time t who received a mark at some 
previous time z, it can be seen that the first step in the actual computations is to group the 
recaptures according to Method A (I, §3). The estimates of N, for x = 0,1, 2,...,7’— 1 may 
then be obtained from the totals for the successive rows of the table of k,, values. 


Numerical example 


The following were the observed k,, for one of the replicates in the sampling experiment (Drawing B): 


t 0 1 2 3 4 Total 
x 
0 z 7 7 5 26 
l ¥ 12 9 8 29 
2 10 13 23 
3 ‘ ‘ ; r 15 15 
R, 50 50 50 50 50 


The succeeding steps are set out in the following table which gives the necessary figures for calculating 
N, and the standard errors of these estimates. In this example it will be noted that since the last 
sample in the chain was taken at t= 4, the population can only be estimated for x =t=0, 1, 2, 3. 


(1) (2) (3) (4) (5) (6) True 
T T T a r value 
Ro. = R, Sky = uy (2R,+1)(Zky+2) (4)+(5) Neta(N,) of N, 
r+ t=r+1 (=zr+1 
0 50 200 26 174 5628 0-03092 372 + 65 300 
l 50 150 29 121 4681 0-02585 252 + 42 252 
4 50 100 23 77 2525 0-03050 210+ 37 212 
3 50 5G 15 35 867 0-04037 159 + 32 178 
4 50 — — — 


The values of N, are obtained from the first three columns. Thus 
N,, = 50 x (200+ 1) + (26+ 1) = 372, 
and for the standard error, using column (6), 


3(N'y) = 372 x /0-03092 = 372 x +0-1758 = +65. 
Similarly, N, = (50 x 151) + 30 = 252, 
a(.N,) = 252 x /0-02585 = + 42. 


It is evident that the whole calculation can be carried out very rapidly, once the table of k,, values 
has been formed. 


Although this is an extremely convenient method of estimating the number of survivors 
in a population of the type we are considering, a difficulty arises if we wish to use these 


results in order to estimate the survival factors for the various intervals between the samples. 


Thus we have P,= Ni/N, (t = 0,1,2,..., 7-2), 
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and, for large samples, JVM) VMs) 
VM) | VN) (M) VMs) 
ven) = Pa yg + yg 2° aa | 

where p is the correlation between N, and N,,,. Now it is difficult to see how a series of 
estimates made from the successive rows of a table of k,, values can be entirely independent. 
Thus, to take the above table as an example, the estimate of N, is based essentially on the 
ratio of the total for the second row, 12+9+8 = 29, to the total for these three samples, 
namely, 150. This proportion of occurrences is obviously related to the proportion of non- 
occurrences, (150—29)/150, which must include some, if not all, of the 7+7+5 = 19 
occurrences in the first row, which contribute to the estimate of No, and also of the total of 
23 in the third row, from which the estimate of N, is obtained. (The problem of the inter- 
relationship of the rows is complicated by the fact that an individual animal caught on 
a certain date may have multiple marks. It will thus be included in a number of the k,, 
tigures for time t, depending on the number of times it has been previously marked.) On 
general grounds, therefore, we might expect a certain degree of correlation between the 
successive estimates of N,, and it seems likely that the coefficient will be negative in sign. 
If this suggestion is correct, we would tend to underestimate the variance of P, by taking 
p = 0 in the above expression. However, as a practical procedure, it appears necessary to 
do this, since there is no obvious way of obtaining, in any particular case, a measure of the 
covariance between successive N, by this method of estimating the total numbers. Unless 
this covariance is very marked, we should at least obtain some idea of the relative magnitude 
of the error which must be attached to the estimate P, by taking 


VON), VN) 
MYM. 











V(P,) = Pi 


Out of interest the correlation coefficients were calculated for the successive estimates 
No a x, and N 3 in the case of the sampling experiment. The three values, each based on 
twenty pairs of observations, were ry, = — 0-265, r;. = + 0-007 and r,, = — 0-333, of which 
none, considered separately, would be reckoned as significant. Overall there is a suggestion 
of a small negative correlation, though the degree is so little marked that we should not be 
led into any serious error by assuming it to be zero. Thus, for the above numerical example, 
we would have 


P, = 0-677 + 0-161, 
Pf, 
P, 


0-833 + 0-198, 


0-757 + 0-202. 


It will be seen that the errors are relatively very great in sampling on this scale, amounting 
on the average to about 25 °% of the estimated values. (If the true correlation between the 
successive estimates of N, in this experiment was p = — 0-3, these relative errors would only 
be raised by 3%.) In regard to any practical use of these estimates of P, and their standard 
errors, it may be noted that we could compare the difference between the values for two 
independent populations observed, for instance, over the same time interval, though any 
tests of significance should be interpreted with a certain degree of caution in the light of the 
foregoing remarks. There are, however, difficulties in comparing differences between the 
P for different time intervals within the same population, since the successive estimates are 
correlated owing to the occurrence of N,,, in the numerator of P, and in the denominator 
of P,,;. 
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This method of estimating the values of the successive survival factors is closely related 
to that suggested by Jackson (1948, and earlier papers). There are, however, slight differences 
between the two methods. Jackson adopts the same system of grouping the original data, 
namely, in terms of k,,, the number of individuals recaptured at time ¢ who received a mark 
at some previous time x; but he then corrects these observed values to their most probable 
value, supposing that the size of the samples taken at times ¢ and x had both been equal 
to 100. Thus, in the symbolism used here, the observed k,, are first replaced by 


_ 10%, 
Yat Lal R,R, » 





Then, the survival factor over the interval of time z to x + 1 is estimated by 


Pm s > — Roys s ky 5 esate. 
? ae oe t=z gf eth! R, “t=2+2 R,] t=7+2 R, , 
2 
with Vp, =t2tFe 
y Kesit 
t=r+2 


the sum of the uncorrected k,, values being used in the denominator of the expression for 
the variance. H 
Writing the estimate of P, used here in terms of the maximum likelihood estimates NV, 


instead of the adjusted Y., in order to simplify the comparison between the two methods, 
we have 





R T T : T 
—_ eth 7 ~ 4 
} A ™ R ‘ D> R, = R,. >» ky D> kesat 
ry «2t+2 r+1 t=r+1 t=r+2 


and it will be seen that, apart from the use of y,,, the main difference is in the number of 
k,, terms summed in the respective numerators. Thus, k,, and the corresponding R, are 
summed from ¢t = «+1 to 7 in the latter expression for P,, and from ¢t = x+ 2 to T in the 
former. We might therefore expect a slightly greater degree of precision through the use 
of this additional sample, since a greater proportion of the total information yielded by the 
data is being utilized. (As an illustration of this point, the values of P, obtained by Jackson’s 
method for the numerical example were P, = 0-655+0-193, P, = 0-739+0-236 and 
P, = 0-867 + 0-328. In each case, both the absolute and relative errors are greater than 
those given above.) But, against this possible advantage in favour of the method used here, 
we must balance the fact that Jackson’s method is the more general and may also be 
employed in the case of a population which is being recruited by a variable number of births 
and immigrants, whereas the estimates developed in this section are based on the assump- 
tion that only a variable death-rate is in operation. 


(6) Method B grouping 
Although the foregoing method is an extremely convenient and expeditious way of 
estimating the number of individuals in a population of the type we are considering, it does 
not appear to make use of the total amount of information contained in a given body of 
data. Thus, suppose that instead of grouping the data by Method A, as we have done 
hitherto in this section, we now adopt the Method B system, in which we divide the sample 
of R, taken at time ¢ into unmarked individuals (u,) and the number (m,,) last captured at 
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ted time x(x = 0,1,2,...,t-1; }m,+u, = R,). Then, as before, if Ny is the total number of 
ces Y 
re individuals in the population at some origin of time and Po. __¢_1)Np the number of these 
ie surviving at time t, the expected number of individuals in the population as a whole falling 
ble into the various marked and unmarked classes can easily be written down. Thus, for 
on example, for t = 1, 2,3 we have the following table, where at each time ¢ the first column 
gives the expected number in the population and the second the number in the respective 
classes observed in the sample of R,: 
t= ] 2 3 
Po(No— Ro) uy Po(No—Ro)— Puy Ue Poie(No— Ro) — Piz — Pat Ug 
PyRo Moi Fy, Ro— Pym, Moe Pore Ro — P21 — PM Mo 
PR, My P,2R, — Pymy, M43 
P,R, Meg 
Total P,N R, Pu No R, Por2No R, 
Putting Vo! = Xo, (PpNo)-! = X, (Po No)? = X2, ete., the expected proportions in the 
™ various classes become linear functions of the X, = N;', and the table of expectations and 
observed values may be rewritten as 
NV, t= l 2 3 
ls, | 1-RyX_p uy 1—RyX,—u,X, Ue 1— Ry Xy—Uu,X,—UgX_ Uz 
RyXy My Ry Xq—M,X, Moz Ry Xo — Mg, X,—Mq2X_ M93 
R, X, M2 R, X, — m2. X_ M3 
of R, X, Moz 
re Total 1 R, 1 R, l R, 
he 
se The maximum-likelihood equations for estimating the X, may easily be written down. 
i on Thus, for example, if we merely had a short chain consisting of the samples taken at ¢ = 0, 
9 é 2 om 
is and , OL ns Mo, + Moo Ko Uy Ry Uy Ry —0 
id i ©» be-te, 1 eer 
mn 
e, OL “es a a. a | 
a OX, X, 1—RyX)—u,X, RyXy—mM,X, 
hs These would be troublesome equations to solve directly for the two unknowns, and it can 
p- be seen that in the case of a longer chain the complexities will increase with the number of 
X, we wish to estimate. The simplest method of solution in such cases will presumably be 
to approximate to X, by inserting trial values in 0L/¢.X, and the second differential coefficients 
02L/0X?, o2L/¢eX,,eX,, etc., and obtaining improved approximations in the usual way. Thus, 
of if £, is the column vector of a set of trial values of X,, 7, the vector of scores obtained by 
“ inserting these in the ¢L/dX, and M the matrix of the numerical values for the second 
of differential coefficients, then the next approximation to X, is given by 
le f= §,+M-'. 
le | This process of successive approximation, however, is likely to be somewhat laborious, 
it particularly in the case of a long chain of samples. 
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In regard to the form of the information matrix, from which by inversion the asymptotic 
variances and covariances of the . can be obtained, it will be seen, taking the above 
equations for estimating X, and X, as an example, that we have 

As Uy, Up Ry ___ Mi Moe ho 
Then, replacing the observed w,, v2, mo, and mo. by their expected values, we find 
0X, 0X, 
In a similar way it may be shown that for any number of X,, —0?L/0X,0X, = 0(a+b). 
Thus, the information matrix is diagonal in form when observed values are replaced by their 
expectations. 

In order to illustrate the comparative amount of information (J,) in regard to the X, for 
the two methods, A and B, of grouping the data, we may take a chain of four samples from 
which X5, X, and X, can be estimated... (When the data are grouped by Method A the 


maximum-likelihood equations can equally well be expressed in terms of X, = N;', instead 
of N,, as was done in the first part of this section: ) We find 








= 0. 








Method B x Method A 
a Rk, R; Ry 
b qi GRR ISRID) x mexy MR Ra 
Ry Ry R, 
40x | +x, X,(1— BX, Le + Fal 
I R, Rs R,R; 
‘ X,(1—R,X,) X,(1—R,X,) 


Thus, J, is the same for both methods of grouping, while in the case of both X, and X,, 
1(B)>I(A), since 1—R,X, and 1—R,X,<1. In a similar way it may be shown that for 
a chain of samples taken from a population of the type we are considering at times 


0, 1,2,...,7', we have 
I(B)7_, = I(A)p_+- 
I(B),>I(A), (¢ = 0,1,2,...,7’—2), 


the difference between the two methods of grouping becoming progressively greater as the 
sequence X7_,, X_3, etc., approaches X,. This loss of information in the Method A system 
of grouping is, however, not necessarily very great compared with the relative errors of the 
resulting estimates. Thus, for example, the following were the expected percentage errors 
of theestimates of Zz for the two methods of grouping in the case of the sampling experiment: 





+ 1000/X 
Ew 
t Method B Method A 100 (B/A)* 
0 + 12-75 + 15-81 65 
1 + 13-99 + 16-41 73 
2 + 16-46 + 18-00 84 
3 + 22-62 + 22-62 100 


It will be seen that the efficiency of A compared with B is some 65 % for Xo, and that the 
difference between the relative errors of the estimates is just over 3%. This difference 


xs fA wet = Af = 62 DOD FS of eF | 
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becomes progressively less for X,, X, and X;. Evidently on data of this type for a short chain 
of samplings, the ease and rapidity with which we may estimate the X, when the data are 
grouped by Method A, compared with the more laborious solution for Method B, offers 
immediate practical advantages. The error to be attached to the estimates is not very 
greatly increased, while the time spent in the actual computations is very much less. 
Moreover, since we are ultimately interested in the estimates of total numbers and not their 
reciprocals, there is the question whether the maximum likelihood estimates N, = X7* for 
the equations based on the Method B grouping will be biased. If so, and on general grounds, 
it seems likely there will be some degree of positive bias, there is no adjusted estimate of N, 
which immediately suggests itself, as in the case of Method A. However, this question of 
bias will tend to become of less importance when the number of recaptures is relatively large, 
and in such cases the Method B system of grouping may be preferred if the maximum degree 
of precision in the estimates of total numbers is required. 

In order to solve these maximum-likelihood equations by some iterative process, it is 
necessary to have some approximate values of the unknowns as a starting point. It is true 
that these may always be obtained from the data grouped by Method A, but it is convenient 
to be able to arrive at some first approximations when the grouping has already been carried 
out according to B. (Nothing is more exasperating than to find that in order to carry out 
some computation or other, it is necessary to change the method of grouping, once this has 
already been done in some particular way. There is, for instance, no obvious relationship 
between Methods A and B which can be used as a short cut, and a change from one to the 
other requires a complete retabulation of the whole of the original data.) The following 
method of obtaining approximate estimates has been found most useful in practice. 
Essentially, it consists in replacing the k,, values for the Method A system of grouping by 
values which are in part observations and in part expectations. Once this has been done, 
then either X, or NV, may be estimated by the methods given in the first part of this section. 

In the first place it may be noted that in both systems of grouping certain classes are the 
same; thus we always have k,, = m,, for x = t— 1. Consider now, for example, the expected 
proportions in the population and the observed m,, recaptured at ¢ = 2 and 3 when the data 
are grouped by Method B, namely, 


t= > 3 
RyXo— MX, Mog Ry Xq— Mg, X1— Mog Xyq Mg 
R, X, Mie R, X,—m_X, m3 
R, X, Meg 


If the sample of R, is the last taken in the chain, then, as shown above, /(A), = 1(B)., 
and, since k,, = m3, we immediately have the maximum-likelihood estimate for X, when 
the data are grouped by Method A, namely, 

X_ = Mg3/(Rz Ry). 


Now, if 79,2. is the number of individual animals recaptured at time ¢ bearing the previous 
marks 0, 1, 2,...,¢—1, we have (I, §§2, 3 and 4) 


Myo = Ty2tTo12) 
kis = 713 +7013 + "123 + Tor23 


= M13 + 1103 + To123- 
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Then the expected number of survivors of the m,, animals caught and marked at ¢ : = 2, 
which will be recaptured at t= 3, is RsP,m49/(Po:2No). Thus, given an estimate of 
X, = (Py, No)~!, the expected number E(r,.3 +7123) = 12.3 X_, and we may write 


kig = M3 + M42 R; Xp, 


where a dash is attached to the symbol in order to indicate that we have replaced the 
observed 53 and 79193 in k,, by their expected values. Since k,. = m,,, an estimate may now 
be made of X,, thus mo t+ kis 


= R,(R, + Rs) ‘ 


and hence, proceeding in a similar fashion to the formation of kj,, we have from the top line 
of the above table of expected proportions and observed numbers, 








kos = Mog + Mo, R, X,, 
kos = Mog + Mo, Ry X, + Mop Ry Xp. 
—_— at koo + kos 
vO RoR, + Ry + R3)° 

It is easily seen how this method of estimating the X, may be extended to a chain of 
greater length, starting with the estimate X,,_, which is made from the results of the last 
sampling. At the same time, if we do not wish to proceed to the full solution of the maximum- 
likelihood equations for the Method B system of grouping, we may obtain an adjusted 
estimate .V, of the total numbers in an analogous way to that described in the first part of 
this section, without having to regroup the data. Thus, we may take 


v T 3 
¥,=R,( 2 8+1)/( >» ky+1) (x = 0,1,2,...,7—1). 
\r+1 


t=zr+1 





Finally, since mp, = ko, 


It is not quite clear what the theoretical sampling variance of this estimate should be, but 
we shall not be far wrong in taking 


oe = sy T 
V(X.) = ¥a[ > ual {(_= B+ 1) ( > k+2)} |, 
t=r+1 zr+1 t=zr+1 


where in both expressions dashes are attached to k,, and u,, = R,—k, to indicate the 
difference between the values as calculated here and the observed integers which would be 
obtained by grouping the data in the first place according to Method A. 


Numerical example 


From the practical point of view the most convenient way of carrying out the actual calculations is 
to obtain first of all the various sums of the kj, values. This may very quickly be done in the 
following way. 

Given a table of observed m,, values for a chain of samples taken at ¢ = 0, 1, 2, ..., 7’, we build up in 
turn a series of column multipliers (z,) which are given by 


a 
Zz2=1l+— Bky (¢=T-1, T-2,..., 1). 
Then, working backwards from time 7’, the successive sums Dk/, are 
kq-4,7 = Mq-1,75 


x r 
x ke= ZX 2M 
tez+1 tez+1 


oO = ao OO 


— “ve 
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each sum as it is formed being used to calculate the next 2, figure. The whole procedure is best 
illustrated by a numerical example. Thus, taking the data for the same Drawing B as was used earlier 
in this section, the following was the table of observed m,, values: 


ae « 0 1 2 3 4 Uke x N, 
. t 
x 
0 : 2 28-62 0-002862 339 
1 ; : 4 32-98 0-004396 222 
2 ‘i 2 9 22-00 0-004400 220 
3 : ° , 5 15-00 0-006000 159 
R, 50 50 50 50 50 
2 : 1-66 1-44 1-30 1-00 
Then, writing down immediately z, = 1-00 and k3,= 15-00 at the foot of the corresponding column and 
row as indicated above, we form 
2, = 1+k3,/R, = 1+ 15/50 = 1-30, 


this figure being entered at the foot of the corresponding column. By operating with these two values 
of z, on the 2 = 2 row of the table, 


Xkg, = 10 x 1-30+ 9 x 1-00 = 22-00. 
Then, in succession, Z, = 1+ 22-00/50 = 1-44, 
Xk, = 12 x 1-444+9 x 130+ 4 = 32-98, 
z, = 1+ 32-98/50 = 1-660, 
and finally from the top row, 
Xkoy = 7 x 1-664 5 x 1-444 6 x 1-304 2 = 28-62. 


The values of either X, or N t, Whichever is required, can now be calculated by means of the equations 
which have already been given. Thus, for example, 


28-62 
Xs a OE, 
°  50(50 + 50+ 50 + 50) 
~  §60(200+1) 
N ooo" = ’ 
°” 98-6241 cs 


the variance of this latter estimate being calculated in a similar way to that given earlier in this section 
as an illustration of the Method A system of grouping. 


This method of estimating N, from the data grouped in this way gives very similar results 
to those obtained from the observed numbers in the various k,, classes. Thus, the following 
were the observed means and variances of the y, for the twenty replicates in the sampling 
experiment for the two methods of grouping. 








True Observed N t Observed s? 

value — A ~ Expected — —* ‘ 
t of N, Method A Method B oc Method A Method B 
0 300 311-9 302-5 2250 3089 2477 
1 252 262-0 260-9 1710 1508 2154 
2 212 213-3 209-7 1456 2077 1354 
3 178 179-4 179-4 1622 1385 1385 


Clearly there is little to choose between the two series of results. The agreement with 
expectation is, if anything, closer for the estimates made from the data grouped according 
to Method B, but the differences are so slight compared with the sampling errors that either 
method might equally well have been used. The irregular discrepancies between the 
observed means of the twenty individual estimates of ¥, and the true values are apparently 


25-2 
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caused in each case by random departures from expectation in the data themselves. This 
point was checked by first calculating the mean number of recaptures in each k,, and m,, 
class from the observed frequencies, and then estimating the N, in each case from these 
tables of mean values. In doing so the maximum-likelihood estimate N , was used, and not 
the adjusted y, From the Method A table the values of N, for t = 0,1, 2,3 were 309-6, 263-6, 
212-8 and 181-8, while from the Method B table the scdiiiiginitialy values were 300-8, 
260-5, 210-5 and 181-8. These figures are very close to the means of the N, for the individual 
replicates, which are given in the above table, and show the same irregular departures from 
the true values. This agreement is also evidence that the adjusted estimate N ,is relatively 
unbiased, and justifies the empirical use which has been made of it in the present problems 
of estimation. 


(c) The number of survivors from a known group of individuals 

Finally, it should be added that although these problems have been considered here in 
terms of a series of samples taken from a population consisting of an unknown number 
of individuals at some origin of time, the same methods are applicable when we wish to 
estimate the number of survivors from a known number of animals alive on a certain date. 
Thus, suppose that a group of G, individuals are alive at a date d, or, in other words, they 
are marked and released at that time. Any members of this group which are recaptured in 
any subsequent samples from the general population can be recognized by the presence of 
the mark d. Now, obviously, the number P,G, surviving out of the original G, at the time 
of the first resampling of the population after the date of their release is unknown, and will 
therefore be equivalent to N, in the symbolism in this section. Similarly, if there are g, 
recaptures of this group at the first resampling, who receive a mark and are returned to the 
population, these individuals are equivalent to what we have termed here the sample of 
R, captured and marked at ¢ = 0. Thus, with reference to this group of G, individuals, if 
we call the time at which the first resampling was made after the date of their release ¢ = 0, 
and if we then form the tables of either the k,,,, or mz, values, we may estimate the number 
out of the original G, surviving at t = 0,1, 2,...,7'— 1. There is, however, one unsatisfactory 
feature of this method of estimating the wunber of survivors. Just as in the general case of 
sampling from an unknown population, an estimate of No, for instance, may by chance be 
greater than the true value, so we may obtain an estimate of the number of survivors at some 
time t, which is greater than the originai number G, in the group. At present there appears to 
be no obvious way out of this difficulty, which is particularly liable to occur for earlier values 
of ¢ in cases when the death-rate is not very great and a relatively small number of the 
original group are recaptured. Another method of approach in such cases would be to 
assume that the death-rate for the group remains approximately constant during the 
period of sampling and to estimate the constant P by the methods given in the previous 
paper. Then the number of survivors at subsequent times could be calculated from the 
‘known value of G@, and the estimate of P and its standard error. 


3. WHEN ONLY TWO PARAMETERS ARE NEEDED TO DESCRIBE THE POPULATION 


Although the methods described in the previous section have been developed in terms of 
a variable death-rate, it is evident from the numerical examples which have been given 
that the errors of estimation may be such that in a great deal of data which will be met with 
in practice, we might equally well regard the death-rate as a constant over the period of 





eo 0? ao 8 0 


® @ 
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sampling. Then, in a population of this type, in which the dilution factors are zero, only two 
parameters would need to be determined, namely, the number of individuals N, at the time 
of the initial sampling and the constant survival factor P. When the number of recaptures 
is not very great and the death-rate is not changing to any marked extent, this assumption 
of a constant death-rate may be sufficiently accurate for all ordinary purposes. 

In order to solve this problem of estimation, it is best to group the data according to 
Method B. We can then easily write down a table of expected population numbers and 
observed numbers in a similar form to that given in the previous section for a variable 
death-rate when the data are grouped in this way. (In order to save space, this will not be 
done here, but the form of the table is easily seen by replacing P,,,__, by P+! in the table 
referred to.) Then it will be found that the maximum-likelihood equations for the simul- 
taneous estimation of Nj and P can only be solved by some process of successive approxima- 
tion. The somewhat tedious calculations which would be involved in this process can, 
however, be avoided to some extent, and we may obtain estimates of the two parameters 
at the cost of some loss of information by dividing the work into two parts. As a first step, 
the parameter P, together with its variance, is estimated from the distributions of recaptures 
by the method described in the previous paper (I, §5 and Appendix I). This estimate is then 
used to determine N) in the following way. 

Considering only the two broad classes of marked and unmarked individuals, s, and u, 
respectively, in the samples of R, taken at ¢t = 0,1, 2,..., 7’, we have the table of population 
and observed numbers, for example, at t = 1, 2,3, 





t= 1 2 3 
PN—-PR, % P?N,—(P?Ry+Pu,) uz P3N,—(P?R,+ P2u,+Pu,) us 
PR, 8; P?R,+ Pu, 8» P®R,+ Pu, + Pu, 83 
Total PN, R, P?N, R, P®N, R, 


Then forming the log likelihood equation and differentiating with regard to Ng, 


ok & ‘ u,P iM u,P? R,+R,+R, _ 
ON) Ny—Ry PNy—(PRot+u,) P2Ny—(P?2Ry+ Pu, + ue) No 





0. 


For a given value of P this equation may be solved by iteration by a similar method to that 
already described for estimating P from the distributions of recaptures (I, Appendix I). 
There are various ways of rewriting this equation in order to simplify the actual computa- 
tions, the one adopted by the author being to replace N, by X = Nj‘ and then by inserting 
trial values to find the value of X which will make 





Uy, Ug _ U n in 
Bra Bet Bet cd E me id 1—(Ry+P-u,) X “ 1- (R, + Pu, + Puy X* * *® 
Once the terms in the denomin< - \rs within the brackets have been calculated for the given 
P and the observed w,, the solutivn follows very quickly with the help of a table of reciprocals. 
It must be remembered, however, that in rearranging the original equation in this way, we 
shall actually be calculating the value of X,0L/0X, for a trial value X,, and it will therefore 
be necessary to obtain by division 0L/0X,, 0L/0X,, etc., before interpolating for the estimate 
& which makes oL/0oX = 0. This point is of importance, since in order to estimate the 
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variance of X, we require a,, = —0*L/0X? and a,. = —0*L/0P 0X, and approximate values 
of these quantities may also be obtained from the iteration. Thus, given the variance V(P), 
if we put 


1 az, 
Qoo ST =? 
V(P) % 
Then V(X) = “3s V(P), 
ay) 
and hence for N, = ¥-1, 8h, = + Ne V{V(X)}. 


Numerical example 
Using this form of the original maximum-likelihood equation for No, the following were the values 
of 8L/éX for X, = 0-0031 and X, = 0-0032, in the case of one of the replicates in the sampling experiment 
A A 
(Drawing L) for which P = 0-862 and V(P) = 0-005714 (I, Table 2, Method B), 





eL aL 

— =+1053, —— = —410. 

ox, i OX, 
Then, as a first approximation, 

1053 x 0-0001 
= 0: fat pnchiies Bact y | 7. 
R = 0-0031+ cae 0-0031 
@L 1463 

and ay= 3x2 O- 1 = 1463 x 104. 


In order to obtain an approximate value for a,,, we use in the same equation this estimate ¥ =0-00317 
and two trial values of the survival factor, P, and P,, which are taken at equal deviations on the 


positive and negative side from P = 0-862. Thus for P, = 0-860 and P, = 0-864, respectively, it was 
found that 


ch ae. 
ax ax 
Hence 13> - saat oy = — 56250. 
Then, Ay, = 1/0-005714 + (562502/1463) x 10- = 391. 
V(X) = (391/1463) x 0-5714 x 10-* = 0-1527 x 10-8, 
8x = + 0-3908 x 10-*. 
Finally, since , Xo N, = 315, 


8y = +315? x 0-3908 x 10-7 = + 39. 


This method of estimating XN, and P separately must lead, at least theoretically, to less 
precise results than if we were to solve the full maximum likelihood equations for the 
simultaneous estimation of these parameters. It has, however, the advantage of saving 
a great deal of time and labour, and in certain circumstances the loss of information appears 
to be negligible. Thus, to take the sampling experiment as an example, the theoretical 
variance-covariance matrix for the full process of estimation (five-point sampling) was 


A A 


N, Pp 


>| 1339 = -1-783 
~ L-1-783 0-004473 |" 


For the separate estimation of the two parameters, the asymptotic variances were 
V(No) = 1408 and V(P) = 0-004919, so that in this case the loss of information was not very 
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great. Taking the results for the twenty replicates, for each of which N 9 and P were estimated 
separately by the methods given in this section, the mean values of N, and P were 309-6 and 
0-839 respectively, compared with the true values of 300 and 0-840. (There is a slight 
suggestion here that the estimate N » may be positively biased.) The observed variance- 


covariance matrix was ; 
gf 2008 = —2-295 
"| 2-295 0-004227 |’ 


in which each element is based on 19 p.F. Actually, in this example, these observed variances 
and covariances are compatible with those given above. For, if we calculate the trace of the 
matrix product X-1S, then 19 (tr.) should be distributed as y? with 38p.F. In the present 
case x? = 43-6, which is a perfectly reasonable value to obtain. 

This expected loss of information, however, may become appreciable in the case of 
a shorter chain. Thus, to take the three-point sampling in this experiment (I, §8), in which 
only the samples taken at t = 0, 2 and 4 were considered, the asymptotic variances for the 
simultaneous estimation of N, and P2 were VN o) = 3883 and V(P2) = 0-05391. When these 
parameters were estimated separately the expected V(N,) = 6566 and V(P?) = 0-07051. 
These differences are quite marked and evidently, if these were data for which the 
maximum degree of precision in the estimates was required, it would be necessary to proceed 
to the full solution of the maximum-likelihood equations. 

Incidentally it will be noted that these comparative figures for the five-point and three- 
point sampling chains in this experiment also illustrate the advantage of increasing the 
number of samples taken within a limited period of time, to which attention has been drawn 
in the previous paper (I, §8) in regard to the estimation of the survival factor. By increasing 
the number of samples from the minimum of three to a total of five, the expected percentage 
error of an estimate of N, is reduced from about 21 to 12 % (using the most efficient method 
of estimation in each case), which is quite a marked increase in precision. Although these 
figures are based on a specific numerical example, and thus cannot hold in general, roughly 
we might expect a similar increase in precision in sampling on this scale from a population 
consisting initially of a few hundred individuals, in which no dilution was occurring through 
births and immigrants during the period of sampling. 


4, WHEN DILUTION Is OCCURRING, AND THE DEATH-RATE IS VARYING BOTH IN 
TIME AND BETWEEN DIFFERENT GROUPS OF ANIMALS 


It may be of interest to consider very briefly a possible solution for the much more general 
problem of estimating the number of individuals N, alive at time ¢ in a population for which 
both the dilution factors B, and the survival factors P, are varying in time. When we take 
a random sample of the population as a whole, we assume that the marked individuals are 
a random sample of the various possible classes of marks, and we may consider these sub- 
classes separately according to the time the individuals were first marked, as well as 
according to their subsequent history. Thus, in a chain of samples, we may consider the 


group of R, individuals initially marked at ° 0 and estimate the number of these 
surviving at ¢ = 1,2,3,...,77—1 by one or o1 =: of the methods given in §2. Similarly, 
a group of w, individuals are first marked: = 1, and the number of these surviving 


at ¢ = 2,3,..., 7’—1 may also be estimated; + iso on for all the remaining u,. Then the 
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number in each group surviving at time t may be written, in terms of a variable survival 


factor P, 
t a , 
; Ry = Porsz...t-12o 
a” 
Uy = Pies.t1% 
Uy = Pes...1-1¥2 


where dashes are also attached to the symbols to indicate that the survival factors P;, P?, P/’, 
etc., are not necessarily the same for all the groups which are observed over a given interval 
of time ¢ to t+ 1. Thus it is assumed that the survivorship curves of the various groups may 
differ according, for instance, to their age composition at the time they were initially 
marked, as well as according to their different mortality experience as time goes on. In the 
absence of any means of determining the actual ages of the individuals sampled, this 
assumption is probably the most general which can be made in the application of the 
capture-recapture method to a wild population. 

Having estimated the number of survivors in each group at time t, together with the 
variance of these estimates, the expected total number of marked individuals in the 
population as a whole at time ¢ is clearly given by 


VY, = Rot Uy t+ Uy t+ eee + U1 145 
and since the different groups of animals are independent, the variance of y, will be obtained 


by summing the variances of the individual estimates. Then, if in the sample of R, taken at 
time ¢ the observed proportion of marked animals is p, = s,/R,, with variance 


Vip) = p(l—p)/R, 
the total number of individuals in the population may be estimated by 
N,=v,RJ, (t= 1,2,3...,.7-1), 


ny Sol Vi) ee] 
and V(N ~ fy| “to, Th . 
iii ie 
Alternatively, the adjusted estimates N ,and VN, could be used, though in general this will 
probably not be necessary, since this method of estimating the N, would only be applied to 
data in which the number of recaptures is relatively large. 


5. PRELIMINARY ANALYSIS OF A SET OF DATA 
(a) A test for the absence of dilution 


The methods of estimation which have been discussed in the previous sections fall into 
two main groups depending on whether the population was being diluted by the entry of 
new individuals (§§1 and 4), or whether dilution was absent (§§2 and 3). Apart from the 
very general case briefly discussed in §4, it was assumed throughout that the death-rate, 
whether constant or variable, fell equally on all subclasses of marked and unmarked animals 
in the population. Now, if a particular set of data was obtained from a population in which 
no dilution was occurring, it would be a waste of information to estimate the total numbers 
by the methods given in §§ 1 and 4, since the relative precision of such estimates depends to 
a very great extent on the choice of the most appropriate model to describe the population 
sampled. A preliminary analysis of the data before deciding on the most suitable method of 
estimation may therefore be well worth the additional labour. 
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In a large number of cases, of course, this question is easily settled by field observations 
made at the time the samples were being taken. Thus the occurrence of increasing catches 
of very young individuals in the unmarked class would immediately exclude any analysis 
of the data by a method which assumes the absence of dilution. There are, however, other 
occasions when it may be suspected that the degree of dilution was so slight that this factor 
could be neglected in the calculations. Cases also arise of the reverse problem. Thus, Chitty 
(1952) gives an example of some results of sampling from a population of the vole, Microtus 
agrestis, in which on general biological grounds no dilution should have been occurring, 
since it was known that no breeding or immigration was taking place during the time the 
marking was being done, and that all age groups were equally at risk of capture. But, on 
analysis, the data appeared to show that an unexpected number of new individuals had 
been entering the population during the period of marking, the most likely explanation of 
this phenomenon being a changing degree of relative trap-shyness among the unmarked 
animals, which affected the results in the same way as dilution. Clearly in such cases the 
biological evidence alone may be misleading. It is therefore useful to have some method of 
testing whether a given series of results are compatible with the assumptions made in 
§§ 2 and 3, that no dilution was occurring and that the death-rate 1 — P, was the same for all 
subgroups of the population. 

The most convenient method of doing this unfortunately involves yet another method 
of grouping the data, namely, that termed Method C in the previous paper (I, §4), in which 
the recaptures are grouped according to the time they were first captured and marked. Thus, 
if c,, is the number of individuals caught at time ¢t who were first marked at time z, the total 
catch R, can be separated into the various c,, classes and the unmarked class u,, so that 
c+ u, = R, Then supposing that no dilution was occurring and that the population was 
z 


decreasing merely through the operation of a variable death-rate, the expected numbers of 
survivors at timet of the Ro, u,, Ue, ...,%_, individuals initially marked at times 0, 1,2,...,¢—1, 


are 
Porz..s-12o 
Pres ...t-1 U1 
Pag... 1-1 Ue 
Pi U1 


assuming that the variable survival factor P, is the same for all subgroups during the interval 
t to t+1. Since the total number of individuals in the population at time t is P,,. 1, it 
can immediately be seen that the expected proportion of individuals originally marked 
at a particular time x is constant for all values of ¢ = x+1,2+2,...,7'. Thus, if we form 
a table of the observed c,,, in which we also include the unmarked class w,, such as, to take 


t 1 2 3 4 Total 
x 


4 

0 Con Co Cos Con % cor 
4 

1 : Cig Cis Cia ~ Cit 
ri 

2 : . Cos Cos = Ce 

3 Cyq C4 

Uy Ug Us UN 





Total R, R, R; R, 
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for example a chain of five samplings, then, under the assumptions which have been 
made, we should have for the first row, ¢9,/R, = Co9/R, = Co3/R, = ...; for the second, 
C1o/Ry = €43/Rz = C,,/R, = ...; and so on for all the remaining rows of the c,, values. We might 
therefore regard this as a type of contingency table and calculate the expected numbers in 
each class from the marginal totals. Thus the expected numbers in the first row are calculated 


rT jr 
by muitiplying the mean proportion > cg, / > FR, by the appropriate F,; similarily those in 
1 i 


T T 
the second row are obtained from }>\c,, / > R,; and so on for the remaining c,, rows, the 
2 


expected number of u, in each case being filled in by subtraction. Now, if dilution of the 
population was occurring, the proportions in each row of the c,, values would not be constant, 
but would tend to decrease as time goes on. In calculating the expected numbers in this way, 
then either one or other of two things would happen if our assumptions of no dilution and 
a death-rate 1 — P, falling equally on all subgroups in the population were incorrect. The 
first possibility is that negative expected numbers of u, would occur for some of the later 
values of ¢, in which case our assumptions are immediately disproved; or all the expected 
numbers remain positive. In the latter case, we may calculate x? in the usual way from the 
expected and observed numbers, the degrees of freedom for a chain of samples taken at 
t = 0,1,2,..., 7’ being 7(7' — 1)/2. An insignificant value of y? would indicate either that no 
dilution was occurring or that the number of new entries was so small relative to the total 
population as to be negligible. In such cases we may estimate the total numbers by one or 
other of the methods given in §§ 2 and 3. An excessive value of x? is more difficult to interpret, 
since it might be caused either by dilution or by a difference in the mortality rates for the 
various groups, or by a combination of both of these factors. Unless any differences in the 
mortality rates are very marked, however, the most likely explanation will be that it is the 
presence of dilution which is causing the departures from expectation, though it would 
require a much more detailed analysis to decide definitely between the various alternatives. 

The Method A system of grouping has some of the properties of Method C, as may be seen 
from the discussion in § 2. Thus the relation between the rows of a table of k,, values and the 
total catch is very similar to that of the c,,. In the absence of dilution, the proportion which 
each k,, forms of the appropriate R, remains constant in each row of the table. But in 
Method A, > k,,+u,+8,; in other words, we are not dividing the individuals caught at 


each sampling into a set of mutually exclusive categories, and in employing this method 
of grouping for our present purpose, we should therefore be unable to apply any ,? test in 
cases of doubt. The disadvantage of having to adopt another system of grouping in order 
to test whether of not dilution was occurring is to some extent offset by the fact that if 
x? was satisfactory for a particular set of data grouped by Method C, we may obtain estimates 
of the total numbers without having to regroup the data, by means of 

‘ T ( 

N,= u,( y+ i) /( x Cat 7 (uy = Ry; x = 0,1,2,...,7'—1), 


=r+1 


vin vetyesif (Ea £ 09] ((Sa($ oh 


It can be shown, however, that these estimates are theoretically less efficient than those 
made in a comparable way from the data grouped by Method A (see §2), the relative 
efficiency C/A being given approximately by u,(N,—R,)/R,(N,—1u,), which is <1 for all 
values of z > 0. 





ee 





<a mwmamanfr ea nea @B ® 


— Ss sl 


€ 
i 





———— 





P. H. Lesuire 387 


(b) A method of obtaining approximate estimates from a long chain of samples 

The following method of obtaining approximate estimates of N,, P, and B, from a long 
chain of samples has proved extremely useful in practice. Suppose that a population of 
some species of animal with an intermittent breeding season has been sampled over a period 
of several years, during which time a number of successive generations may have been 
represented in the catches. (For example, in the case of many small mammals, such as the 
wood-mouse, Apodemus sylvaticus, or the vole, Microtus agrestis, the chances are small of 
an individual living very much more than a year in the wild state.) Clearly, before embarking 
on any detailed analysis of the results, it is necessary to have some rough picture of the 
changes which were taking place in the total numbers and of the seasonal variations in the 
mortality rates and dilution factors. Since in a preliminary analysis no very great accuracy 
may bé required, it is well worth while sacrificing a certain amount of information in order 
to obtain a rapid and simple method of estimating these parameters. 

When the recaptures in a long chain of samples from a population of the type we are 
considering are grouped by Method B, the entries in each row of the table of m,, values 
gradually become smaller and finally end by becoming zeros as each group of marked 
animals dies out. A column of m,, for a time t then may consist of a number of zeros for the 
earlier values of x. Consider now the start of this table, such as, for example, the following, 
where certain entries are circumscribed by a series of overlapping triangles: 























t 
- 0 1 2 3 4 
0 . Mo, Mog] =— Mogg Mogg 
1 ° ° M9 M3 My, wee 
2 . ° 2 Nits Mo eee 
3 . ° ‘ ‘ Mg eee 
Total Ry R, R, R, R, 
catch 








The first triangle of recaptures, consisting of the entries my,, mp. and m,., may be regarded 
as the results of a three-point sampling at t = 0,1 and 2 from a population consisting of 
a variable number J, of individuals, for which the survival factors P, +P, and the dilution 
factors By + B,. Then from these recaptures an estimate may be made of P, by the methods 
given previously (I, §5), and this value may then be used to determine J, as in §1 of the 
present paper.* Thus P, and N, may be estimated from 


, Moo R, + Mo, M42 Ie Pook, 
= R ? 5 ae 
Myo tro Mo, . 
Moa(Moo + Myo) RF vin i iE) R,—™, _ V(Po) 
3 R2 > 1 = 1 R mf A ° 
ma “o Mey P3 
* This can be shown by forming the maximum.-likelihood equations for the simultaneous estimation 
of the five unknowns, No, Py, P;, By and B,. Since in a three-point sampling only three degrees of 
freedom are available, only three parameters can be determined and these prove to be Py, N, and the 
ratio B,/P,. This last quantity is of little use taken by itself, though if we were to assume that P, = Po, 
an estimate could be made of B,. In the present problem, however, it is not necessary to make this 
assumption. The solutions of the maximum-likelihood equations for P, and N, and the variances of 
these estimates are the same as those given in I, §5 and §1, as might be expected. 
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The next triangle, consisting of the entries m,., m3 and m3, can be regarded as the results 
of a second three-point sampling which is shifted one interval to the right along the time 
axis, the total catches R,, R, and R, taking the places of Ry, R, and R, for the first triangle. 
Thus we may estimate P, and N, by replacing the m,, in the above equations by 
Moss, (¥ = 0, 1; t= 1,2). Similarly, from the next triangle and the appropriate 
R, (t = 2,3, 4), we can estimate P, and N 3, and so on. Although we are sacrificing information 
by neglecting all except the last two entries in each m,, column, this loss may not necessarily 
be so great as it at first sight appears owing to the occurrence of either small numbers or 
zero entries for the earlier values of x as time increases. Much, however, will depend on the 
relation between the time interval adopted for the samplings and the expectation of further 
life of the individuals sampled. 

The actual calculations can be carried out very quickly when the data are arranged as 
in the following table: 


t Observed values WwW, Xe Yt N t P, 

. & — _ — — —— %4/Y, 
1 Ry, My, Me My Mo, -™y2 Moe Rh, +, My2Ry R,x,/v, Xo/Yo 
2 2 Mz, Mg Meo Myg-Meo3 M3 Ry + We Me R, Ry X_/wW, X3/Ys 
3 ¥3 ria mas 7 Pa + M34 Me Ry at Ws M4 R, Ry 5 is tate 


The first four columns consist merely of the R, and the observed m.,, for each successive 
triangle. From these the three quantities w,, x, and y, are formed, and it will be noted that 
whereas the two former are calculated from the observations in the same row of the table, 
y, is obtained by multiplying an m,, by the  R, value for the previous row. The estimates 
N, and P, immediately follow, the values of P, being entered one row higher than the values 
of x, and y, from w hich they are calculated. If required, the adjusted estimates N , may be 
used in place of V,, though for approximate work any bias in the latter is probably not of any 
very great importance. 

Having thus obtained the successive N, and P,, we can then calculate A, = N,,,/N, and 
hence by subtraction B, = A,— P,(t = 1, 2, ..., 7’ — 2). Although the variances of X, and P, can 
be estimated by means of the expressions given above, the variance of an estimate B, is 
rather troublesome, owing to the complicated set of covariances which exist between two 
successive triangles, and the author is unable to offer any solution to this problem. The 
point is perhaps not very important, since the purpose of the calculations is to obtain a rough 
picture of the general trend of these parameters over the whole period, rather than any 
precise estimates which may be used in tests of significance. 
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1. FoREWORD 


Eight years ago the present writers submitted to Biometrika the first of a series of papers 
on the long-tailed field mouse, or wood mouse, Apodemus sylvaticus. It was there explained 
that our interest lies in determining whether local differences in the size and proportions of 
the body can be detected in this species. If so, can they be associated with the environ- 
ment, or are they due to local inbreeding? How far is interbreeding restricted by ecologic 
isolation or, in a continuous community, simply by distance alone? This first paper dealt 
with the growth of marked mice throughout the winter season when there was no breeding. 
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A second paper showed that they were subject to a very high death-rate, only a small 
proportion of the summer and autumn young surviving till the next year’s breeding season ; 
this alone must have reduced the chances of very close inbreeding. The present paper gives 
some data on the extent of their wanderings and therefore on the distance at which it was 
possible for them to find a mate.- 

To the present readers of Biometrika these descriptive biological papers may appear out 
of place, though the authors may plead that they are in the early biometric tradition, as 
outlined fifty years ago in the Editorial foreword to the first volume and in Francis 
Galton’s note on ‘Biometry’ following that Editorial. Twenty-one of the twenty-three 
major contributions to that first volume were biological or anthropometric; the mathe- 
matician was only called in to aid the biologist in analysing his observations and field work. 
In the remaining two contributions the mathematical methods described were devised for 
biological problems and illustrated by biometric examples. With the course of time the use 
of such methods in the analysis of all kinds of statistical data has become better under- 
stood, and more and more orthodox, and work of this nature is accepted for publication in 
a wide range of specialized journals. Meanwhile statistical theory has developed into a vast 
and complex branch of mathematics to which the pages of Biometrika have become 
increasingly devoted. 

In the last volume of Biometrika, however, there appeared together three papers of 
biological interest, on theoretical methods of estimating the death-rate and size of animal 
populations; in them are references to other work on the same lines, to which considerable 
attention has been recently paid by statisticians. In the first of these Leslie & Chitty (1951) 
consider the problem in the case of small mammals when trapped alive, marked and 
released. Their methods depend on a number of assumptions, and are not, in this paper, 
related to any specified observational work under natural conditions. These assumptions 
are hard for a field biologist to concede. In practice a ‘population’ of mice is difficult to 
delimit unless on an island. Each mouse will not wander at random over the whole area 
but around its own home and possibly in the direction of a known food supply. The homes 
may be on the margin of the area or even outside it, and are likely to have some relation 
to the inevitable irregularities of the vegetation and soil conditions, and to the habits and 
distribution of predators. Weather conditions are known to have a considerable effect on 
the numbers caught in any one trapping, and so also may the duration of darkness. The 
age distribution of the population is continuously changing, and not only is the death-rate 
almost certainly different for mice of different ages, but probably also the extent of their 
wandering. The two sexes certainly have different wandering habits. 

Our own trapping programme was not planned to test any theory, or indeed to provide 
answers to the questions in which the writers referred to are primarily interested. As 
already explained, our interests lie in a different direction. Observations on the death-rate 
and density of wood mice in successive years have accumulated by the way, and are not 
in as useful a form as if the work had been planned to investigate such questions directly. 
We believe that many more facts need collecting in the field before the mathematician can 
usefully be called in to seek some clarifying formula, which in turn will point the direction 
for further observations. The most that we can hope for our own observations is that they 
will help future field workers to plan their trapping in the way most likely to throw light 
on their particular problem. In publishing in Biometrika we also hope there may be 
mathematical readers interested in having brought to their notice some of the complexities 
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of animal life, and the difficulties met with in studying living populations, especially when 
the information about them is distorted by some system of trapping. 

We are indebted to Mr R. Versey for the care which he has taken in preparing the map 
for publication, and to Prof. E. 8. Pearson for suggesting many improvements in the text. 


2. DIFFICULTY OF DETERMINING THE NATURAL RANGE OF MICE WHEN TRAPS ARE 
PLACED AT REGULAR INTERVALS 
2-1. Arrangement of traps in a grid pattern 
When first trapping Apodemus in Dorset in 1937 we wanted to find out how far these mice 
were confined to certain environments or how far they were ubiquitous and wandered 
freely from one environment to another; we therefore placed our traps in irregular groups 
on sites chosen for their differences in plant covering. The distances between these sites 
were quite fortuitous and ranged from a few yards to over 3000 yards. When trapping 
on neighbouring sites was not simultaneous we often recaught a number of the same mice, 
and at distances of under 250 yards these recatches were frequent (Hacker & Pearson,1951). 

During the following winters at Holwood Park in Kent, we wanted to observe the growth 
and survival of individual mice, so sought the best arrangement of traps to ensure catching 
at regular intervals the majority of mice living within a single large area. In view of our 
experience at South Haven Peninsula we decided to place six traps in a circle of 10 yards 
radius round each of the intersection points of a 100-yard grid. We drew the grid on the 
‘25 inch’ (1/2500) Ordnance Survey map (Kent XVI, 13, 1933) and then pegged out the 
intersection points on the ground. To mark our trap sites we placed a circle of six other 
pegs 10 yards from each other and 10 yards from the central peg, the first of them to the 
north-east of the latter; in actually placing the traps we chose the likeliest places within 
a few yards of these six pegs. The sites are shown as circles in the map on the folding sheet 
at the end of this paper. 

This hexagon of six traps was a variant of the three traps placed by Chitty (1937) at 
the intersection points of his 30 by 40 yard grid in Bagley Wood. In our work at South 
Haven Peninsula, where our groups of twelve to twenty-four traps were either in lines or 
scattered irregularly, we found that certain traps caught more than their share of the mice 
available, as if some spots were more frequented by mice than others or perhaps lay on 
mouse pathways. By choosing six spots for our traps instead of placing them all at one 
spot, we felt we would increase the chances of catching all the mice visiting or living in the 
neighbourhood. 

This arrangement of traps suited our main purpose but meant that it was impossible to 
record the maximum travels of individual mice in terms of any simple set of distances. 
Within a single hexagon of traps distances of 0, 10, 17 and 20 yards could be recorded; 
between two adjacent hexagons twelve different distances, varying from 80 to 120 yards 
when one hexagon lay to the north-east of the other, but from 83 to 118 yards when it lay 
to the north-west; between two hexagons placed diagonally on the grid, similar sets of 
distances lying between the limits 141 + 20 yards. For hexagons with centres at 200 yards 
from each other the limits would be 200 + 20 yards, overlapping the limits for a ‘knight’s 
move’ which would be 224 + 20 yards. 

If all such intervals were set out in numerical order, not only would they make a very 
irregular series, but each would have a different chance of being recorded. For instance, 
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within a single hexagon of traps, 0 yards could be recorded at six places, 10 yards could 
be recorded between six pairs of traps, 17 yards between six pairs of traps, but 20 yards 
between only three pairs of traps, while between two adjacent hexagons the chances 
become still more complicated. The availability of each interval would further depend on 
the number of pairs of hexagons between which such an interval could occur, and this 
would depend on the size and shape of the grid. 

In setting out a frequency distribution of maximum distances travelled it is necessary 
to emphasize this irregularity of the values of the variate and of the chances of any one of 
the values occurring. Chitty (1937) and Evans (1942), in quoting the frequencies yielded 
by their own different systems of trapping, have pointed out the difficulties of interpreta- 
tion, but further warning seems necessary. We fear that biologists may still be tempted to 
calculate an average distance from such tables, and to regard this as the expected distance 
under natural conditions, descriptive of the average wandering power of the mice. 


2-2. Dependence of travel records on mesh of grid 


The artificial nature of the frequencies with which mice are observed to travel certain 
distances when traps are arranged on a grid system is not only because some of the dis- 
tances between traps occur more frequently than others, but also because the mice 
themselves behave differently when enmeshed in grids of closer or wider spacing. On the 
one hand a rigid system of trapping sites spaced at 100 yard intervals, quite regardless of 
the home centres of the mice, is clearly too wide in mesh to be of much use in tracing the 
intricate pattern of their wanderings. On the other hand, we found that this spacing was 
sufficiently close to limit the observed range of this wandering, since Apodemus appears 
liable to find and be intercepted by any traps in its neighbourhood; the closer the grid, 
the nearer home is it caught. For this reason Chitty (1937) increased his spacing from 
5 by 5 yards to 30 by 40 yards. 

We ourselves pegged out our grid in 100 yard squares, but during the greater part of 
our first winter’s trapping at Holwood (1937-8) we only set our traps round pegs 200 yards 
apart, shifting them once a fortnight onto other sites, still at this distance apart, so that 
the whole grid was covered in four settings; we made this rotation twice. At the end of the 
season we set traps at all the hexagons where mice had proved to be plentiful and also at 
a number of additional points within the grid, trying to make sure of catching all surviving 
mice. The length of wandering which we observed under this system was much greater than 
that observed in the following season (1938-9), when we set the traps over a smaller 
total area but at neighbouring hexagons simultaneously, so that the spacing was only 
100 yards. 

For 1937-8 the maximum distances between traps in which the same mouse was caught 
are set out in the first column of Table 1. Catches at points not in the regular hexagons 
are ignored. The shorter distances are grouped into (a) those within a single hexagon, 
(b) those between two adjacent hexagons, (c) those between two hexagons placed diagonally, 
(d) those between alternate hexagons in the same row and (e) those between two hexagons 
at a knight’s move from each other. There are comparatively few records of the same 
mouse being caught in two hexagons farther apart than this, and where it occurred the 
actual distance between traps is given (f). An idea of the area which mice covered in their 
nightly wanderings is given by the patterns in the second column of the table. Each of 
these shows all the positions in which the same mouse was caught, while the figures in the 
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first column show the distance between those positions that were farthest apart, the greatest 
distance over which the mouse was observed to range. The body of the table shows how 
frequently each pattern was observed in the first two years of our trapping in Holwood 


Table 1. Observed range of mice under different systems of trapping 
on a 100 yard grid 


Frequency of range 
A 








c ret 
Within whole area Within part of area 

of trapping common to both seasons 

a A ‘ ct “ Terr. oe 
1937-8 1938-9 1937-8 1938-9 


Observed Pattern of 3 —— ~+—,  -——*—_{,  -——*— _ -—— 
range (yd.) sites visited 33 99 bd ole) od 99 33 role) 























(a) 0 to 20 . 13 23 69 69 2 2 45 42 
(b) 100 + 20 a 20 25 34 6 5 6 21 3 
(c,) 7 4 5 — 1 2 2 pee 
(c) 141+ 201 (c,) °8 4 6 1l 1 2 1 6 — 
\(Cs) $8 oa: — 1 — — ais nfs am 
e-e 1 a _ — —- 1 i oo 
(d) 200 + 20 Sale 1 Ty aa ne 1 Ne * hah 
(e) 224 + 20 tits 1 3 ~- —_—  — -- —_  — 
eo 4 1 “ate me Bs Pree — 7 
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Note. In column 2, the larger dots indicate the points in the grid pattern 
where a mouse was caught. 


Park. It will be seen that the modal range lies in the 80-120 yard group for 1937-8 and 
in the 0-20 yard group for 1938-9; that is to say, under the first system of trapping a mouse 
caught more than once was found most commonly in two neighbouring hexagons and not 


infrequently in hexagons even farther apart than a diagonal of the grid, whereas under the 
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second system it was caught far more commonly in one hexagon only, and only once at a 
distance greater than that of a diagonal. 


2-3. Effect of stze, shape and nature of area covered by grid 


If we were to confine ourselves to a study of these frequency distributions alone, we 
might accept that of 1937-8 as the best evidence we have of the amount which mice 
wander under natural conditions, and regard the lower mode of 1938-9 as due solely to the 
interception of mice by the closer spacing of the trapping sites. Further consideration of 
the reasons for the difference between the two years suggests, however, that it was not 
only due to the shifting of traps about the grid giving a wider spacing in 1937-8 but also 
to the greater and more varied area covered in that season. In 1938-9 traps were set in 
only twenty-one hexagons, three parallel rows of seven in the wooded part of the park; 
sixteen peripheral hexagons surrounded a single inner row of five. These hexagons are 
marked as double circles on the map. Not only were there far fewer ways in which the 
longer distances could be recorded within this strip of grid, but a high proportion of the 
mice caught were probably visitors from the surrounding area where there was no trapping 
to determine the extent of their range. In 1937-8 there were two far larger and nearly 
square trapping areas, and about as many hexagons were in the centre of these as on the 
periphery. Traps were set at all hexagons marked on the map except those nine with double 
circles projecting to the west in the wooded part of the park. If the number and arrange- 
ment of the hexagons in which individual mice were caught in this season is related to their 
place on the grid, it appears that for nine out of the thirteen males and fifteen out of the 
twenty-three females caught more than once but at one hexagon only, this hexagon lay on 
the periphery, while for a large proportion of the males caught at two adjacent hexagons, 
one or both of these hexagons also lay in the peripheral row. Evidently the great number 
of short distances which were recorded had no relation to the wandering powers of the 
mice but were at least in part an effect of the edge of the grid. 

Still further analysis shows that other important factors contributed to the frequency 
with which the different distances were observed. The grid has so far been considered as 
if it lay upon a uniform background, every part of which was equally likely to contain or 
to attract mice. But in 1937-8 the background was in fact very varied, an irregular mix- 
ture of woodland, grassland, arable land and scrub. The more westerly of the two sections 
of the grid lay in the woods and grassland of Holwood Park, while the eastern section lay 
partly in woods and scrub, partly in farmland. 

An examination of the map shows that the margin of the western section was bordered 
on the east by woods which were separated by a wide stretch of open grass from the main 
body of woodland west of Holwood House. Of the mice caught on the periphery in one or 
two hexagons only, twenty-nine were caught in this marginal woodland or in the grass just 
outside it. They either did not travel farther out into the grassland or only made rare 
excursions into it without rambling there sufficiently to find the traps. 

Yet another factor was the length of time for which the mice survived. There was less 
chance of recording a long distance for mice which disappeared early in the season, pre- 
sumably through the action of predators, and a number of mice in the short-distance groups 
of Table | belong to this category. Among mice caught once only, which are not included 
in the table, there were few exceptions to the rule that those caught in the more central 
hexagons were caught near the beginning of the season. 
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2-4. Observed range of mice surviving in centre of grid throughout the winter 


Excluding for the moment all the records from our eastern area, where the grassland 
was very extensive and the arable land presented new problems since it was only tem- 
porarily a habitat, we find that in the western or park area the distance records for 1937-8 
can be made to assume a new and higher mode by subtracting those of all mice of low 
range caught on the periphery and all mice not caught after January. In Table 2 the 
distance frequencies are given for the western area both with and without this subtraction. 
It will be seen that for both sexes the number of mice caught always in the same hexagon 
is reduced to one, and though the mode for females does not rise above 80 to 120 yards, the 
mode for males is now 180 to 244 yards. 

In the eastern or fields area there was a high proportion of grassland rarely visited by 
mice and ten out of the twenty peripheral hexagons lay in it. So few mice were caught in 
these peripheral hexagons that traps were only set in them once. In this area, therefore, few 
short-distance travels were observed round the periphery. In the centre of the area the 
grassland almost surrounded some of the most frequented localities. While on the one hand 
this appears to have restricted wandering, on the other hand several exceptionally long- 
distance travels were recorded across the grassland from one of these localities to another. 
In spite of these differences from the park area, the elimination of peripheral catches and of 
mice disappearing early from the area does not raise the mode for males above a knight’s 
move or that for females above a move between two adjacent hexagons. 


Table 2. Observed range of mice caught in park area only, 1937-8. Frequencies (A) when 
all mice are counted, (B) after deducting mice not caught after January and mice of low 
range caught on periphery of area 


A B 
Frequencies Corrected frequencies 
Observed range ———_—_ —————"—_——_ 
(yd.) eX) 22 $d 29 
0 to 20 10 16 1 1 
100 + 20 10 12 1 5 
141+ 20 3 6 1 : 
200 + 20 2 0 2 0 
224 + 20 5 1 5 
> 224 1 1 1 1 


3. METHOD OF PRESENTING RECORDS 


It is now clear that we cannot get very far in studying the wandering habits of Apodemus 
by confining ourselves to a table of the greatest distances between any two traps in which 
a mouse happened to have been caught. We need to consider all the positions in which it 
was caught on the grid against a background of the map, and how far the pattern of these 
positions may be the result of the limited extent of the grid or of the nature of the country 
in which the traps lay. We need also to consider the dates on which the mouse was caught 
and whether its early death or late appearance within the grid area may not have limited 
our chances of observing wider movements. 

It is difficult to present the data in a form in which all these variables and their inter- 
relationships can be easily studied. It is quite impossible within the scope of a single two- 
dimensional diagram. The map given at the end of the paper shows the shape of the grid 
in the two areas chosen for trapping, but only roughly indicates the major differences in 
vegetation. More detail of these differences is given below in §§ 4-1 and 5-1. In the 
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thirteen charts, printed with the map on the two folding sheets at the end of the paper, the 
two areas are taken separately; the centres of the hexagons are indicated by dots, and 
only sufficient of the map is shown to help the eye to pick out corresponding hexagons in 
the different diagrams. 


3-1. Explanation of charts 


In these diagrams dots indicate the centres of the regular hexagons and crosses ( x ) the points at 
which mice were caught outside the hexagons. 

In Charts I-IV and VIII-XII every mouse caught more than once in the 1937-8 trapping season is 
represented by a letter placed next each hexagon or extra point at which it was caught. All the mice 
not caught after January are represented by small letters, the survivors by capital letters; capital 
letters are in ‘bold type if the mice were not caught till the irregular trapping at the end of the season 
(pp. 398, 402). The numbers after the letter are those of the weeks in which the mouse was caught. For 
this and subsequent years, we numbered the weeks from the first Monday in October; the dates 
of trapping in these weeks are shown for 1937-8 in the Calendar of Trapping (§ 3-2). The mice 
are allotted to different charts according to area, to sex and to the observed extent of their range, 
as indicated in the summary given below (§ 3-2). The lettering starts afresh in each diagram, so that 
there is no connexion, for example, between mouse A of Chart I and mouse A of Chart II. The symbol + 
after a number, e.g. H 27+ of Chart II, means that the mouse was found dead in the trap in that week 
or died before release. 

In Charts V and VII, showing mice caught once only, the sexes are combined, and each mouse is 
represented, not by a letter, but only by the number of the week in which it was caught. Males are 
placed above, females below, and mice whose sex we failed to determine to the left of the dot marking 
the centre of the hexagon. Here again figures in bold type refer to mice not caught till the irregular 
trapping in weeks 20 and later. 

Charts VI and XIII show the scheme of trapping in the two areas. The numbers at each site are those 
of all the weeks in which traps were set there. Our general method of shifting the traps is described on 
p. 392, but it will be seen from these diagrams that at some of the sites we only set traps once in the 
two rotations, which lasted from week 2 to week 19. The points at which traps were set between the 
regular hexagons in the final weeks of trapping (weeks 20-30) were too numerous to be shown on these 
two charts. Their positions are described in § 4:3 for the western area and § 5-3 for the eastern area, 
where the weeks in which the traps were set there are also given. Only those points at which mice 
were actually caught are marked by crosses ( x ) on the other charts. 


3-2. Summary of charts 
Western or park area 
Chart I. 34, shorter ranges (patterns (a), (b) and (c,) of Table 1). 
Chart II. 33, longer ranges (patterns (c,), (d), (e) and (f)). 
Chart III. 929, shorter ranges (patterns (a) and (b)). 
Chart IV. 99, longer ranges (patterns (c,), (c,), (e) and (f)). 
Chart V. $3 and 99 caught once only (weeks in which caught). 
Chart VI. Trapping scheme (weeks in which traps were set in each hexagon). 


Eastern or fields area 
Chart VII. 33 and 2? caught once only (weeks in which caught). 
Chart VIII. $3 and 99, longest ranges (patterns (f) of Table 1). 
Chart IX. 33, long ranges (patterns (c,) and (e)). 
Chart X. 99, long ranges (patterns (c) and (e)). 
Chart XI. $4, short ranges (patterns (a), (b) and (c,)). 
Chart XII. 29, short ranges (patterns (a) and (b)). 
Chart XIII. Trapping scheme (weeks in which traps were set in each hexagon). 


Notes 

(1) As there were fewer short range males than females, in order to avoid overcrowding Charts II and 
IX, the male mice with patterns (c,) have been classed with the short range groups on Charts I and XI. 

(2) The greatest range of certain mice was from a hexagon to an extra point, or from one such point 
to another. These mice are placed on the charts according to length of range and not according to 
hexagon pattern. They are as follows: 

II G, 164 yards; VIII E, 375 yards; VIII F, 358 yards; IX B, 212 yards; and X D, 123 yards. 
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Calendar of trapping, 1937-8 








Western or park area Eastern or fields area 
No. of week Dates of trapping No. of week Dates of trapping 
First rotation: 2 12-16 Oct. 3 19-21 Oct. 
5 2-4 Nov. 6 9-11 Nov. 
7 16-18 Nov. 8 23-25 Nov. 
9 30 Nov.—2 Dec. 10 7-9 Dec. 
Second rotation: 11 14-16 Dec. 12 21-23 Dec. 
14 4-6 Jan. 15 11-13 Jan. 
16 18-20 Jan. 17 25-27 Jan. 
18 1-3 Feb. 19 8-10 Feb. 
Final trapping: 20 14-18 Feb. —_ — 
21 21-26 Feb. 22 28 Feb.—5 Mar. 
23 7-10 Mar. 24 15-19 Mar. 
25 21-26 Mar. — — 
30 27 and 30 Apr. 30 25 Apr. 


3-3. Hedgerow trapping south of the grid 


In week 4, not included in the Calendar, we laid fifty-nine traps for three nights in a broken line along 
hedgerows and patches of woodland to the south of our regular areas. The line was about three-quarters 
of a mile long and is marked on the map by rows of dots, each dot representing a single trap. Twenty- 
two of these traps, in the middle of the line, were set again for three nights at the end of week 21. 


4. WESTERN OR PARK AREA 


4-1. Description of area 


The western section of our grid lay in the park surrounding Holwood House. Half the 
trap hexagons were in the very mixed woodland between the house and the main road 
from Bromley to Westerham. Most of the other hexagons were in the grassland to the 
north-east of the house. In places this grassland was dotted with trees; it included a 
cricket pitch surrounded by a wide area of mown grass, and a large paddock bordered by 
Lake Wood to the north and by a narrow shaw, or strip of wood, to the south-east. A strip 
of rough grass and bracken, marshy in places, and with seedling birches and a few scattered 
trees, separated the mown grass round the cricket pitch from the main drive, while a drier 
bracken area with occasional young trees and thorn bushes lay between this drive and three 
great earthen ramparts west of it. These ramparts, known locally as the Bulwarks, are 
part of an Iron Age camp, much of which has been gradually levelled or quarried away, 
revealing, it is said, a stockade of blackened and crumbling oak trunks against which the 
earth had been piled (Kempe, 1815). William Pitt is said to have planted groups of firs on 
the ridges, while oaks have sprung up as seedlings in the ditches. In 1937 the ramparts 
were deep in bracken and fenestrated with rabbit holes. Just west of them, in a little valley 
parallel to the present main drive, was a second drive densely lined with rhododendron 
bushes which were spreading up the slopes on either side. 

The woodland to the west of the Bulwarks was of a very mixed and patchy nature 
reflecting the various tastes, aspirations and resources of its successive owners from the 
eighteenth to the twentieth century. Periods in which ground was cleared and trees planted 
in clumps or belts for ornament or privacy alternated with periods, like the present, when 
the vegetation natural to the Blackheath pebble beds reasserted itself. The chief differences 
between this vegetation when inside the park fence and when outside it on Keston Common 
seemed to be due to the protection given by enclosure and the shelter of planted trees. 
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The bracken grew more luxuriantly inside the fence, and seedling oaks had a chance of 
establishing themselves, but mingled with the birches and oaks were the seedlings of alien 
trees such at turkey oaks, beeches and spanish chestnut. Open patches of turf, kept short 
by rabbits, marked a part of the common enclosed by William Pitt; they were shown by 
the molehills and rabbit scrapings to have had a heavy dressing of chalk at some forgotten 
date. An excellent map prepared by Thomas Milne in 1790 and published in 1806 shows 
the extent of this enclosure, and how the present high road to Bromley and London was 
constructed round the new boundary when Pitt closed the old coach road which passed 
close to his house and along the present main drive across the camp. This map shows a belt 
of trees along the old boundary, the position of which can still be traced. Some of the great 
beeches which in places keep the ground bare of undergrowth may have been part of this 
belt, while in other places close-planted clumps of conifers are said to mark the burial 
sites of famous Derby race-horses. 

Our grid was placed arbitrarily upon this varied patchwork of vegetation, and each 
hexagon of traps, often each trap within a hexagon, lay in an environment distinct from 
that of its neighbours. Round the house there was still greater variety; some of our traps 
were in a corner of the vegetable garden by the potting sheds, others by the dog kennels 
and in the shrubbery between these and the stables. 


4:2. Numbers and distribution of mice in the park area 


In our 1937-8 trapping we caught Apodemus sylvaticus over the whole of this western 
section of the grid. The 152 mice caught (75 males, 73 females and 4 of undetermined sex) 
were scattered over the area, but were rare on the grassland where this was more than 
100 yards from woodland or bracken. Certain hexagons within the woodland also seemed 
unpopular, but the season happened to be one in which the total mouse population was 
unusually small. Though mice were caught at all but two of the forty-eight hexagons set, 
they only entered 142 or less than half the 288 traps. At the twelve hexagons in the mixed 
woodland to the west of the Bulwarks, where traps were also set in the subsequent years, 
only thirty-eight mice were caught and they entered only forty-nine of the seventy-two 
traps. In the following season there was a great increase in population; from November 
to March inclusive 173 mice were caught in these same twelve hexagons and they entered 
all but two of the traps. 

Within the period 1937-42 we found that not only the total number of mice but also 
‘their relative frequency at the different hexagons changed from year to year. We also found 
in some years that those hexagons where exceptionally many mice were caught at the 
beginning of the season were among the most barren before the end, as if some predator 
had settled there, or the dying down of a ground vegetation such as bracken had left the 
mice more vulnerable. In no case, however, could any particular feature of the environ- 
ment, such as the close overhead cover of the rhododendron bushes, the dense bracken 
areas, the small patches of open turf, or the bare ground under large beech trees, be picked 
out as especially favourable or unfavourable to Apodemus. 


4:3. Final intensive trapping in park 


In 1937-8, after the traps had been shifted twice round all the central and some peri- 


pheral hexagons (Chart VI), the following intensive trapping was undertaken in the hope 
of catching all the mice that still survived. 
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(a) Ninety-two traps in irregular groups in the centre of the area, some in the rhododen- 
drons along the valley to the west of the Bulwarks, others in bracken or hawthorn areas 
on either side of this valley. For five nights in week 20. Because of a heavy fall of snow 
in this week we set the traps here again for four nights in week 21. 

(6) Six traps at each of fourteen “hexagons surrounding the central area (a). For two 
nights in week 23 (see Chart VI). On the following two nights these eighty-four traps were 
spread out in an irregular line joining the fourteen hexagons. 

(c) Two traps at each of thirty-seven hexagons, or every hexagon except eleven of those 
on the periphery (see Chart VI). For six nights in week 25. 

(d) Eighty-five traps were distributed among nineteen hexagons and a few near points. 
The sites chosen were those where most mice had been caught during the earlier months. 
For one night in week 30. 

In these final weeks of trapping twenty new mice were caught. Of these, sixteen were 
caught once only, fourteen of them on the periphery and two just inside it. The remaining 
four were caught two or three times, also on the periphery or near it. This suggests that 
late comers were casual wanderers from outside the area rather than immigrants, and that 
the earlier trapping was sufficient to catch all mice living in the centre. 


4-4. Notes on Charts I-V (see p. 396 for explanation) 


Chart I. All the twenty-two mice represented by letters in this diagram are males caught 
more than once but at not more than two of the hexagonal trapping sites of the grid. Four 
of them were also caught in an extra trap placed between the regular sites. None of them 
was caught in two traps farther apart than a diagonal of the grid. It will be seen that the 
capital letters, representing mice which survived at least until the 18th week of our 
trapping season, are confined to the periphery. In the centre of the grid there are only 
three mice and these are represented by small letters (d, e and f), indicating that they dis- 
appeared early from the area. G is perhaps exceptional, as its observed range lay entirely 
within (though near) the periphery yet was only 46 yards, although the mouse was caught 
four times and survived in the area until at least the 21st week. 

Chart IT. The ten males represented in this diagram were caught at least three times and 
two of them seven times during our trapping season. All but two showed a range of over 
180 yards. Of the two exceptions G was caught four times and in three different places but 
was not caught until the 16th week; it had a range of 167 yards. C was caught five times 
in four different places with an observed range of only 130 yards. 

H is unusual in being caught in two hexagons at 200 yards distance and not in the inter- 
mediate hexagon. It was one of the few mice within the periphery which was not caught 
until March, but it is impossible to say whether it was an immigrant or had a home centre 
somewhere in the garden of Holwood House and had failed to wander out so far, or to 
chance en our traps, earlier in the season. Both H and G were finally caught in snap traps 
set by a groom on two small vegetable plots bordering our regular sites in the woods, the 
one in week 27, the other in week 24, when our own traps were not out. 

The wide ranges of the four mice A, B, C and D in the wooded part of the park were 
not observed until the irregular trapping at the end of the season (§ 4:3). Without this 
trapping A and G would have been caught in only a single hexagon, B and C in two ad- 
jacent hexagons, and D in three hexagons with a range of only 122 yards. It does not seem 
to have been the disappearance of competitors which encouraged this late wandering, for 
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the mice which failed to survive in the area disappeared much earlier (Charts I and V). 
It was more probably the search for food which was only then becoming scarce in the 
woods. In the grassland to the north-east of the house wide ranges were recorded earlier 
in the season, as they were throughout the fields area. The mouse J, for instance, was 
caught six times and in five different hexagons in the winter months, with a greatest range 
of just over 280 yards, or more than a knight’s move. This mouse was caught on the 
cricket pitch and the wide area of more roughly mown grass surrounding this, but never 
within the fringing wood. It was last caught at the beginning of February before a period 
when snow covered the ground for nearly a week and may well have rendered so exposed 
a habitat untenable. The mouse L ranged over the paddock to the south-east of J’s area. 
It was caught at one extreme of a knight’s move in weeks 9 and 18, at the other extreme 
in week 16. Unlike J, this mouse survived until at least week 25 when it was caught, as in 
weeks 5 and 14, between these extremes. 

The general picture is of a number of mice which, together with the peripheral mice of 
Chart I, were scattered irregularly over the area, each with a range overlapping that of its 
neighbours on every side. The scattering was denser in some parts, thinner in others, while 
in certain neighbourhoods no males were caught at all, or none which survived there until 
the end of the season. 

Chart III. The thirty mice represented in this diagram are all the females caught more 
than once but in only two places and these not farther apart than two adjacent hexagons. 
Those caught in only one hexagon or in one hexagon and a near point are nearly all on or 
near the periphery of the area. Those caught in two adjacent hexagons will be found 
scattered over the area, in contrast to the peripheral position of the males showing this 
range in Chart I. W, well in the centre of the area, was caught as many as seven times 
during the season but always in one or other of two adjacent hexagons. Small letters 
again indicate that a mouse was not caught after January. X is the only mouse within 
the periphery which was first caught later than week 18, but again we cannot tell whether 
it was a resident which never chanced on the traps or whether it wandered in from outside 
the area. 

Chart IV. The eight females in this diagram are all those in the park area with an ob- 
served range of over 120 yards. In only one case, D, was this range a knight’s move. In 
all other cases except X it was a diagonal. X travelled 427 yards and appears to have been 
a migrant to the neighbourhood of Holwood House, but may have been merely on a long 
ramble to the northern corner of the area when caught there in the 11th week. 

All these females are known to have survived on the area throughout the season, and 
together with the large number of similar survivors in Chart III are seen to be scattered, 
like the males, irregularly over the area. Females were more rarely caught in the paddock 
than males, but a number were caught throughout the season in the open bracken area 
north-west of the cricket pitch, where no males appear to have survived; three males and 
one female died in the traps in this area (Chart V). 

Mice caught once only. Chart V. These mice will be found to resemble those with a short 
observed range. Except on the periphery they were mice which disappeared from the area 
early in the season; in the eighteen central hexagons there was only one caught later than 
week 9. 

The hexagon on the western margin where eight mice were caught in week 5 was near 
a tree where an owl settled, which may have accounted for their disappearance. This area 
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became bare of mice, for only one was caught here in week 14 and none in week 25. At 
seven other peripheral hexagons traps were only set once (Chart VI), and mice with a range 
lying outside the grid may have had little chance of being caught a second time; this is 
especially true in the case of corner hexagons. 

The mouse marked (A) and caught in week 7 in the woods north-west of Holwood House 
had been caught previously in week 4 in the hedge of a small copse in farm land 645 yards 
to the east. The mouse marked (X) and caught in week 18 on the north-west margin of the 
area was recaught in the root field in the eastern area at a distance of 545 yards (see p. 404 
and Chart XI). The ranges of these two males are the widest observed at Holwood and are 
shown on the map (A to A, and X to X). 


5. EASTERN OR FIELDS AREA 
5-1. Description of area 


In the eastern section of our grid the main interest lay in the way in which wide stretches 
of uninhabited grassland separated or encircled localities more frequented by mice, and in 
the degree to which the movements of the mice were limited or extended for this reason. 
In 1937-8 the mice were mainly concentrated in two areas. (1) A field of turnips and 
marrow-stem kale about 8 acres in extent, together with the margin of an oak wood 
bordering it to the north; the field, which was cropped with corn the previous year, must 
have been clear of mice when sown in the spring before our trapping, but large numbers 
were caught there six months later. (2) A hawthorn thicket about 200 yards south-east of 
(1) and covering about 2 acres; this thicket had a core of older trees and was spreading 
into the surrounding grassland. Five of our trap hexagons lay just within the margin of 
the root field and one in the centre; one hexagon lay in the centre of the thorn thicket. 
Both these areas were surrounded on three sides by grassland, grazed by sheep or young 
cattle. 

Over a hundred yards across the grassland to the west of the root field lay Lake Wood, 
separating the eastern and western sections of the grid, while 200 yards to the east waist- 
high hawthorn, briars and brambles had replaced felled woodland in 2 or 3 acres known as 
Bushy Viners. To the south of the root field lay a steep sandy slope named Broom Bank, 
with the scattered stumps of trees which had been felled 12 years previously. This Bank 
covered about 6 acres, and in 1937 was rather open with a patchy covering of rough grass, 
bracken, brambles, broom, a little gorse, seedling birches, and a few shrubby trees; in the 
1933 ordnance map it is marked as adjoining the root field, but in 1937 it was separated 
from this by a 50-90 yard strip of rough grass fenced off from the southern end of the 
field. To the west of Broom Bank stretched a narrow 33 acre arable field, newly sown with 
winter oats in the autumn of 1937; to the east, separated from Broom Bank by a shallow 
gully and surrounded by grassland on its three other sides, lay the thorn thicket or second 
of our mouse-frequented areas. Farther east still, less than 100 yards across this grass from 
the thicket, was a covered reservoir in the corner of a large wood and just south of the 
reservoir, within the wood itself, were a number of pig pens. These pens may have provided 
food for mice but they were two or three hundred yards from the thicket, outside the area 
of our grid, and we do not know whether mice travelled between the two. 

To the south of the whole area about 200 yards of grassland separated Broom Bank and 
the corn field west of it from the hedgerows along which we laid traps in October and 
February (§ 3-3). 
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5-2. Number of mice caught in the fields area 


Fifty-eight mice were caught at one time or other in the 8-acre root field, and nineteen 
in the 2-acre thorn thicket. On Broom Bank mice proved unexpectedly scarce; only 
seventeen were caught on the whole 6 acres and these nearly all along the western border 
adjoining the corn field. Four of the Broom Bank mice were caught 100 yards out in the 
corn field itself, together with one other mouse. On the periphery of the area seven mice 
were caught in Lake Wood at the western corner, sixteen in or just outside Bushy Viners 
on the eastern border and twenty in Ninhams Wood, the oak wood at the northern end of 
the root field. Of those caught in Ninhams Wood, three were also caught at the edge of the 
kale and five farther down the field. The total number of mice caught in the whole of the 
fields area was 142 (64 males, 71 females and 7 of undetermined sex). Notably absent from 
the small Lake Wood batch were the twelve mice caught at the other margin of the wood 
in the peripheral hexagons of the western section of the grid, only 200-400 yards away. 

Traps were only set once at each hexagon in the peripheral row during the double shift 
over the rest of the area which lasted from 19 October in week 3 to 10 February in week 19 
(see Chart XIII). This probably reduced the number of mice caught in peripheral localities. 

a 


5:3. Final intensive trapping in fields 


In March, after the traps had been shifted twice round all the central hexagons, the 
following intensive trapping was undertaken in nearly all localities other than grassland: 

(a) Two lines of traps at 10 pace intervals, one inside and one outside the hedge separating 
Ninhams Wood from the root field. For the first four nights of week 22. 

(b) The four more northerly hexagons of the root field (see Chart XIII) and an extra 
hexagon in the centre of the space between them. For the following two nights of week 22, 
before the mice caught in (a) were released. 

(c) More than 40 traps on Broom Bank scattered irregularly over the slope. For three 
nights in week 22. 

(d) Three of the hexagons on the borders of Broom Bank (see Chart XIII). For three 
nights in week 22. 

(e) A line of traps at 10 pace intervals through the heart of the thorn thicket. For four 
nights in week 22. 

(f) A line of traps right round the thicket, just within its margin. For two nights in 
week 24. 

(g) A line of traps right round the thicket, in the grass, 15 paces outside it. For one 
night in week 24. 

(h) Aline of traps at 10 pace intervals along the whole of the west margin of Bushy Viners. 
For four nights in week 24. 

(¢) A line of traps, nearly 300 yards long, at 10 pace intervals along the east margin of 
Lake Wood. For four nights in week 24. 

(j) The same five hexagons as in (b) above. For 3 nights at the end of week 24. 

In week 30, six weeks after this March trapping, we set traps for one night in Ninhams 
Wood and in the narrow strip of kale not yet eaten by the sheep at the northern end of the 
root field. 

In all the March trapping only seven new mice were caught, three in Lake Wood on the 
periphery of the area and four in the thorn thicket near the periphery. They are included 
in the number already given for these localities. No new mice were caught in week 30. 
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5-4. Number of mice caught in grassland 

In the nineteen hexagons which lay right out in the grassland we caught only twenty-one 
mice. Of these, seven were never caught again and only one was caught again in the grass- 
land. All the rest were caught elsewhere in some more frequented locality, generally in 
a hexagon near to the one in which they were caught in the grass. Ten of the nineteen 
hexagons lay on the periphery of the grid and traps were only set there once, while in the 
remaining nine hexagons there were only the two regular settings. More movement of mice 
in the grassland might have been detected had we combed it by intensive trapping in 
March as we did in the more populous localities, but judging by our experience there it is 
unlikely that many new mice would have been caught. 


5:5. Notes on Charts VII-XII (see p. 396 for explanation) 


Wide ranges. Chart VIIT shows the observed ranges of all mice, male and female, which 
were found in traps farther apart than a knight’s move on the grid. F and a are especially 
noteworthy as they are known to have returned to where they were first caught. F was 
not only caught in the 8th and 24th weeks in Bushy Viners and, intermediately, in the 
22nd week in the thorn thicket, but was also caught in the 19th week right in the middle 
of the grassland that lay between these two thorn areas, suggesting that it may have made 
this crossing frequently. In Chart X, f appears to have been intercepted when making 
a similar crossing in the 17th week. In Chart VIII, a was caught in the 12th and 17th 
weeks at the northern end of the kale and intermediately in the 15th week at the hexagon 
just north-west of the thorn thicket ; b was caught in the 6th and 10th weeks in the turnips 
and kale, in the 12th week in the thorn thicket, and failed to be caught again; while E was 
caught in the 15th, 22nd and 24th weeks in or near the thorn thicket, and lastly in the 
30th week at the northern end of the kale. We might think that E and 6 were migrants, 
the one from the root field to the thicket, the other from thicket to root field, but the 
records of F and a seem to make it quite possible that E and b also chanced to be caught 
when on a long travel from home and that the dates of catching have no significance. 

The greatest distances between any two places in which these mice were caught are 
approximately: a, 3, 303 yards; 6, 9, 310 yards; F, 2, 358 yards; E, g, 375 yards. These 
distances are all considerably greater than a knight’s move between two hexagons of the 
grid. The two greatest distances, those of E and F, do not appear in Table 1 because three 
of the traps concerned were not a part of the regular grid system; in that table F is included 
among the knight’s moves, while E is in the lowest group among those caught in one 
hexagon only, an excellent example of the fortuitous nature of such records. 

Two other mice in the eastern area were found ranging over a distance greater than a 
knight’s move; these were VIII A and B, caught in both wood and root field. Their 
distances are included in Table 1: B, 2, 280 yards; A, 2, 362 yards. Since the weeks in 
which B was caught in the root field alternated with those in which it was caught in the 
wood, the distance seems clearly the result of wide rambling round its home centre. That 
there was no such alternation in the case of A seems insufficient grounds for regarding it 
as a migrant from wood to field. It was last caught in the kale in the 19th week and prob- 
ably did not survive ti!l March, or it would almost certainly have been caught in the 
22nd and 24th weeks in one of the many traps then set at the field hexagons and along the 
wood margin. Its chances of being caught again in the wood before this were small as 
traps were set there so rarely. 
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Root field and Ninhams Wood (Charts IX and X). Of the six mice in Chart VIII with 
ranges greater than a knight’s move, two were males and four were females. From 
Charts IX and X we see that three males and five females were caught in traps a knight’s 
move apart. This is a higher proportion of widely ranging females than was observed in 
the western area where, out of almost the same total number of females, only one knight’s 
move and one move greater than this were observed. 

Five of the knight’s moves include unoccupied grassland (IX F and G; X E, C and f), 
while the remaining three lay within the root field and Ninhams Wood (IX E; X B and a). 
The longer ranges of A and B in Chart VIII were also within this area. The female X B is 
especially noteworthy as it was caught at all but one of the six possible hexagons in the 
root field, with a knight’s move in two directions. Had we set traps more often at the 
peripheral héxagons in the wood, we might have extended the ranges of some of the mice 
not detected so far north. To the south, chances of observation were curtailed by the folding 
of sheep at that end of the field, which prevented our setting traps there in March. But it 
seems clear that the major factor restricting movement was the grassland which lay on 
either side of the field; although the mice ventured there occasionally, these visits or 
crossings were so rare that they did not often chance on our traps. 

Short ranges (Charts XI and XII). As in the western area, the majority of short-range 
male mice were mice which disappeared early from the area, before the intensive trapping 
in March, and so reduced the chances of our recording a wider range. Of the root field mice 
in Chart XI, B alone was caught in February, but like the others did not turn up in March; 
it was first caught in the adjacent grassland hexagon to the east and may have been an 
occasional visitor from Bushy Viners, where trapping was probably insufficient to mark all 
the mice at the beginning of the season. X in Chart XI was caught at the wood margin of 
the western area in week 18 (see top of p. 401) and turned up 545 yards away in the north- 
west corner of the root field in week 24 and again in week 30. A late arrival in both areas, 
it may have lived at a distance from all the hexagons, perhaps somewhere in the woods to 
the north. It may be compared with VIII E, which came from the opposite direction. In 
neither case is there any proof that these mice were migrants which settled in the kale. 

In contrast to the males, a number of females in the root field and in the wood bordering 
it are shown in Chart XII to have survived throughout the season without being caught in 
two traps farther apart than adjacent hexagons of the grid. 

Thorn thicket. Notable short ranges among the males in Chart XI are those of G and H 
in the 2-acre thorn thicket. H was first caught there in October and G in December; both 
were still there in March, and neither was ever caught outside the thicket. The female XII T 
had a similar history to the male XI H. 

It is difficult to understand why no travels were detected between the thicket and Broom 
Bank, but apart from the narrow connexion between these two across the gully, the thicket 
was surrounded by grassland, and the only travels observed were the very long ones across 

. this grassland to and from the root field and Bushy Viners. There may have been move- 
ments across the narrower stretch of grassland between the thicket and the pig pens in the 
wood to the south-east, but there was no chance of observing them. Three mice first caught 
in the thicket in March possibly came from this wood (XI E; XII O and P). 

Mice caught once only (Chart VII). The distribution of these mice is very different from 
that of the mice caught once only in the park area. There they were concentrated round 
the periphery, here no mice at all were caught at more than half the peripheral hexagons. 
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This is because most of these hexagons lay in grassland. Mice caught once only are con- 
centrated in the areas where most mice were caught, and-especially in the root field and 
thorn thicket. Nevertheless, the proportion of them in these areas is very high, twenty- 
three out of fifty-six mice in the root field, and nine out of nineteen in the thorn 
thicket. Mice found dead in the traps and marked with the symbol + are excluded from 
these counts. 

Most of the mice were caught in the earlier weeks of the season, but it is hard to say why 
some were not caught before when they appear to have had plenty of chances, especially 
the four caught as late as week 17 at the kale hexagon next to Ninhams Wood. 

These differences from the park area can perhaps be attributed to the greater concentra- 
tion of mice, or, in the case of the root field, to the less permanent nature of the habitat. 
Many of those caught in this field were possibly occasional visitors from a distance, like 
VIII E and XI X. 

The mouse marked B on the periphery of this diagram and on the map had been caught 
in week 4, in October, 135 yards to the south in the same hedge as mouse A described on 
p. 401; it was not recaught in the hedgerows when traps were set there in February, nor 
were any other mice from the eastern area. 


6. FREQUENCY WITH WHICH LOCAL MICE ARE CAUGHT ON A FIRST NIGHT OF TRAPPING 


In 1938-9, when the observed wandering of the mice was much restricted by our system of 
trapping, we found that those mice caught throughout the season at a single hexagon 
tended to be caught on the first night of trapping, while those caught intermittently, or at 
more than one hexagon, tended to be caught on the iater days (Hacker & Pearson, 1946). 

Under the very different system of trapping in 1937-8, described above on p. 392, the 
proportion of first night catches was still high for some groups of mice, notably for those 
males caught throughout the season in the centre of the western area. 

To illustrate this we have set out in Table 3 the night of catching of the six males and 
seven females which were caught frequently in that area. We have not included mice which 


Table 3. Night of catching, in the different weeks of trapping, of all those mice caught in the 
first rotation of traps which are known to have survived until the end of the second rotation. 
Western trapping area 








Week of trapping 
Cc ——J a 

Chart . Maximum’ No. of First rotation Second rotation Final trapping 

and range hexagons ; 4 —~ + —* lon A ~ 
mouse (yd.) visited 2 5 “| 9 11 4 636¢06lCUdTSSlUiC CCU Ci 

Males 

II C 130 3 —_- —- — 1 —- — 2 l —- — (3) 1 
ILE 205 3 —- — 2 1 —_-_ — 3 l (Il) (1) — 4 
IID 232 4 — 2 _ —_— ] ] — 1 ql) (lj) — 3 
II L 224 3 — 1 = l — 3 2 3; —- — — 1 
IiK 244 4 — 1 — 1 —_ — 1 l _—-_ _—- — 3 
Il J 283 5 —_-_ — 3 ] 1 1 2 2 —_—-_ —_—- _- =- 

Females 

III U 100 2 _- -—- —- 2 — — 38 1 — 1 3 
III W 111 2 _ 1 — 383 — 3 — 1 l | — 2 
III Z 93 2 —- — 3 2 —- —- — l — _ — 
IVR 161 2 —- — — 2 1 —_ — 1—- — — 5 
IVC 153 2 — ] —_-_ — — 2 1 ae DD a 
IVB 126 2 3 —- —- — I —_ — 2 - ] j 
IVD 207 4 — 1 — 2 —- —- 2-—- — = a 
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disappeared early in the season, mice caught only on the periphery, or mice not caught in 
the centre of the area until after the first rotation of traps. The table shows the chart in 
which the mouse’s wandering is recorded, its index number in that chart, the maximum 
range of the mouse and the number of hexagons at which it was caught. When a mouse was 
caught at a trap set between the hexagons the night of catching is put in brackets. 

The frequencies of the three possible nights of catching during the regular shifting of 
traps (first and second rotation) is summarized below. Nights in the five final trappings 
are here omitted because the chance of a mouse getting caught then was quite different 
(§ 4:3); in weeks 20, 21 and 23 the traps were concentrated in certain places, while in 
week 25 they were spread over thirty-seven adjacent hexagons and left out for six nights, 
but only two traps were set at each hexagon; in week 30 the traps were left out for only 
one night so that every mouse caught was a first night catch. 


Night of catching in weeks 2-18: 
A. 





‘ 
First Second Third 


6 males 16 6 4 
7 females 10 8 4 


It will be seen from this summary that males were caught on the first night on sixteen 
out of twenty-six occasions (62 %) and females on ten out of twenty-two occasions (45 %). 
The males were the greater travellers, and evidently the chance of their finding a trap on 
the first night was very high, even if the sites were at a considerable distance from their 
holes. II K was caught on a first night in four different hexagons, two of them at the 
extremes of its 244-yard range, II J on a first night at three hexagons, II D at three 
hexagons and at two extra points, II E at two hexagons and at two extra points. Even 
among the females, IV C and IV R were each caught on a first night at two hexagons a 
diagonal apart on the grid. 

There was a much lower proportion of first night catches among mice caught only on the 
periphery of the area, or in the case of males on the periphery and at one central hexagon 
(33 % for males and 40 % for females). It would appear that here a greater number of the 
mice were occasional visitors from a distance and no more likely to find the traps on the 
first night than on any subsequent night. 

The eastern trapping area seems to have been an area of travel and disturbance. The 
number of mice which disappeared early was very large, and so was the number not caught 
until the second rotation of traps, and it is hard to select with any confidence a group 
of resident mice comparable with those from the centre of the western area. The night of 
catching of those mice caught at central hexagons of the eastern area throughout most of 
the season is given in Table 4, but if this table is examined together with the relevant 
charts it is clear that little information can be gained by pooling the individual happenings. 
Owing to the abrupt changes in environment from vast stretches of grassland to the 
limited areas in which the mice lived, or which in some cases they perhaps only visited for 
food, the behaviour of each mouse needs separate consideration in relation to the different 
places at which it was caught. If all the mice are taken together the proportion of first 
night catches is still high, about 50 % for both males and females. Among the females, 
however, there is a distinct division between the three which were caught at four or five 
different hexagons and the remaining five which were caught at not more than two adjacent 
hexagons and some near points. All three of the former (VIII A, X B and E) were caught 
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travelling widely in the root field and were only caught on a first night on four out of 
fourteen occasions; four of the six third night catchings were of a single mouse, X E, which 
was caught in this field on five occasions and at four different hexagons; it was caught on 
a first night at the top of Broom Bank in week 8, and might be supposed to have lived there, 
but it was not caught there when the traps were set there again in week 17, although it was 
still alive. In contrast to these three mice, the five females which were not observed to 
travel were caught on a first night on as many as eight out of eleven occasions, a very high 
proportion of first night catchings; three of these females appear to have been confined to 
a small area in the root field (X D, XII F and H), one to the centre of Broom Bank (XII N) 
and one to the thorn thicket (XII T). 


Table 4. Night of catching of mice surviving in eastern trapping area 


Week of trapping 








c ~~ 
Final 
Chart Maximum No. of First rotation Second rotation trapping 
and range hexagons ——-———~ i i +. -— 
mouse (yd.) visited 3 6 8 wT! 6 oS 19 22 24 
Males 
IX D 144 4 —- — — 2 1 — 2 1 (1) 1 
IX E 223 3 - —- — 2 1 3 a 1 —_—_ — 
IX F 223 3 — 2 —- — 2 1 1 — (i) — 
IX G 224 3 — 2 1 — — 1 — 3 —- 
Females 

VIII A 362 + — 2 —_—_ — 3 2 — l —_-_ — 
XB 224 5 SELLS VegSe OURS Pages GEE Cf Pere Ere oa 
XE 224 5 a ae oe ae, ae ee ee ee 
XD 123 2 —- «»—— s — 2 — — 1 1 (1) 1 
XII H 100 2 a Ve Vo a ae 
XII N 75 1 —- — — 3 —_— — — 1 (ql) — 
XII T 74 1 1 - -_- — 1 — — — (1) (1) 
XII F 37 1 —_—-_ — 1 -_- —- — 1 — (1) 1 


The possibility that our trapping methods encouraged mice to wander farther than they 
would have done under natural conditions needs consideration. Since each morning we 
took home the night’s catch and only released the mice at the end of that week’s trapping, 
it might be supposed that on the later nights mice were drawn in from farther and farther 
afield because they found no others occupying the area. It is possible to interpret in this 
way the occasions when a mouse was caught at an extreme of its range on a third night, in 
a populous locality from which most of the other mice had already been removed. But 
there is really no reason to think that this is why it visited the locality. The local mice 
would tend to get caught earlier in the night, and so, until we had removed them, reduce 
the chance of a traveller finding an empty trap. In contrast to such third night catches, 
VIII 6 was caught at each extreme of its 310-yard range on a first night, in week 10 in the 
kale at a hexagon where nine other mice were caught in the same week; and in week 12 in 
the thorn thicket where eight other mice were caught. 

Since mice which live near to traps have a much higher chance of getting caught in them 
than others living at a distance, whatever the system of trapping, it would appear difficult 
to estimate the number of mice living or surviving in an area from the proportion of marki-d 
mice recaught in repeated trapping; the mice caught will not be a random sample from 
a population in which all have an equal chance of being caught. The mice living near the 
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traps will have a higher chance of being caught each time. In our experience it should not 
be difficult, except in the breeding season, to catch all the mice actually living in any area 
covered by the traps, if these are sufficiently closely spaced. But it would be much more 
difficult to determine, without extensive repeated trapping of the surrounding areas, which 
of the mice caught were only occasional visitors from a distance; with a surplus of traps in 
the centre there would probably be many of these visitors, while with too few traps not all 
the central mice would be caught. 


7. CHANCES OF INTERBREEDING 
7-1. Probable range of mice 


We may now ask whether distances greater than the 204-244 yards of a knight’s move 
on the 100-yard grid would have been more frequently observed if the effective mesh of 
the grid had been even wider than the 200 yards of our 1937-8 trapping. The following 
evidence makes this unlikely. 

(a) Our irregular system of trapping on South Haven Peninsula in 1937 gave a wide 
range of distances between trapping sites with no traps set intermediately to intercept the 
mice, but only eight travels of over 244 yards were recorded (Hacker & Pearson, 1951). 

(b) There was no interchange of mice between the two areas of the grid in Holwood Park 
in 1937-8. Trapping was not simultaneous in these areas and they were separated in one 
place by only 200 yards of woodland; yet we have only one record of a mouse caught in 
both areas (X on map, see top of p. 401 and p. 404). 

(c) In our hedgerow trapping to the south of the gridded areas in 1937-8, described in 
§ 3-3, one part of our line of traps was only 200 yards from the nearest hexagons of the 
western area of the grid and another part only 100 yards from the eastern area. Of all the 
fifty marked mice released in week 4 at the places where they were caught along this line 
we have seen that only two were recaught on the grid (A and B on map, see pp. 401 and 405). 

(d) In the narrow strip of woodland to which we limited our 100-yard grid in the 1938-9 
season (the twenty-one double circles on the map, see p. 394), we again did not set out all 
our traps simultaneously, but confined ourselves in one week of each month to the nine 
hexagons at the western end of the strip and in the following week to the twelve hexagons 
at the eastern end. This time «he two groups of 3 x 3 hexagons and 3 x 4 hexagons were on 
a continuous part of the grid, forming together three rows of seven hexagons at 100-yard 
intervals. During the season 164 mice were caught in the western group of hexagons and 
205 in the eastern group, but only fifteen of them were caught in both groups. Only one of 
these was caught in two hexagons farther apart than a diagonal of the grid, yet distances 
of up to 465 yards could have been travelled without interception by trapping sites. The 
observed frequencies for these fifteen mice were: 80-120 yards, 73g, 329; 121-161 yards, 
434; 315 yards, 1¢. 

At first sight it seems puzzling that there was so little wandering from one part of the 
wood to the other in the winter of 1938-9, less than our results from 1937-8 would lead us 
to expect. It might be suggested that the greater number of mice present on the area 
restricted individual movements, but we have seen that in 1937-8 there was no evidence 
of any such restriction in the root field, the most populous locality in that season (p. 401). 
Perhaps a sufficient explanation is apparent when we remember that the maximum ranges 
observed in 1937-8 were in many cases not the result of a single travel. They were the 
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distance between two traps in which a mouse was caught on two separate occasions when 
it wandered out in a different direction from its home. in 1938-9, with hexagons set at 
100-yard intervals, a mouse was usually caught in the hexagon nearest its home. This is 
shown by the frequency with which both male and female mice were caught in the same 
hexagon each month (see Table 1). In a, week when we were trapping in one part of the 
wood, a mouse living near the hexagons bordering the other part was likely to be caught 
in one of these border hexagons before it could wander farther and encounter more distant 
traps; its range was thus restricted on one side. When our traps were set in the other part 
of the wood this same mouse would again be intercepted by a border hexagon, most prob- 
ably the one adjacent to that where it was apt to be caught across the border, or at the 
most at a diagonal distance from it. A mouse living farther from the border, nearer to the 
hexagons of the second row, was likely to be caught each month in one of these. To reach 
even a border hexagon in the other part of the wood it would have to travel in one night 
a distance of the order of 200 yards from its home. This would give the mouse a potential 
range of 400 yards with its home as centre. The habitual ranges of the fourteen mice which 
travelled to and fro between the border hexagons without penetrating deeper may well 
have been of the order of 200-280 yards, or twice the distances actually observed, bringing 
us back to something like 244 yards or the outer limit of a knight’s move from hexagon 
to hexagon on the 100-yard grid. 


7-2. Scattered distribution 


Throughout the large area of open woodland to the west of the Bulwarks we have seen 
that the distribution of Apodemus in the winter of 1937-8 was continuous if not quite 
uniform. The general pattern of this distribution was of a scattered population, the in- 
dividuals of which had each their own range, the males ranging more widely than the 
females. The ranges of those mice surviving throughout the winter’s trapping overlapped 
each other but apparently did not coincide. 

In the root field of our eastern trapping area, the most populous locality of all, the ranges 
of those mice surviving the winter overlapped to a greater extent than in the woods but 
were again spread over the area in such a way as to suggest that each mouse had a different 
home centre. It is possible that these homes were not in the field itself but either at a 
distance from it or scattered along its margin where there was a raised grass border left by 
the plough; the apparent concentration of mice in Charts IX and XII at the Ninhams 
Wood border of the field is largely the effect of the double line of traps placed inside and 
outside the wood in week 22 and of repeated trapping of the same individuals. 

Few mice were caught on the grassland in either area. Mice from more sheltered 
localities were frequently caught at a near-by hexagon in the grass, but catches were rare 
at hexagons more than 100 yards out. In the park, continuity of distribution across the 
widest: area of grassland was maintained until the beginning of February by a single 
widely ranging mouse which apparently lived there (J, Chart II and p. 400), but this 
mouse did not survive until the breeding season. Mice have often been found elsewhere 
in burrows in turfy ground in the summer, but from the ranges of the Holwood mice it 
seems unlikely that any except this mouse, and possibly II L, were living out in the grass- 
land that winter. In the fields we found no evidence of grassland residents, but mice 
sometimes crossed wide areas of grass from one populous locality to another (VIII F, a, b 
and E). These crossings were among the longest travels observed, though such travels were 
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not confined to grassland. Thus the open grass areas were not a complete barrier to 
Apodemus, and interbreeding may have taken place between the mice living on either side. 
Since in several of those cases in which mice made long travels they were recaught later 
in the locality from which they originally came, possibly all such travels were of the 
nature of visits rather than migrations. In all our Holwood trapping we found no evidence 
that a known inhabitant of one locality had settled in another at a distance. This was in 
accordance with our observations on South Haven Peninsula, not only of the natural 
wanderings of mice but of their tendency to return home when released at a distance from 
where they were caught. Many were able to find their way back from longer distances than 
any recorded for natural wandering, while those which failed to return were not caught 
later in the place where they were released, nor anywhere else where traps were set on the 
peninsula (Hacker & Pearson, 1951). 

At Holwood none of the mice left at the end of the 1937-8 season were ever recaught 
in the smaller area of our subsequent trapping, but in 1939-40 we recaught three males and 
six females from the previous season, the most populous in our experience. It is note- 
worthy that all six females were caught at the same hexagons, but that the males were caught 
at neighbouring hexagons adjoining, but not overlapping, their previously observed range. 

Migration does of course take place into empty areas with attractive food supply, such 
as heathland regenerating after fire or cultivated land growing a new crop. Some of the 
mice caught in the Holwood root field in the autumn of 1937 may, as just suggested, have 
lived round the margin of the field, while others appear to have been casual visitors, but 
any that had their homes within the field itself must have immigrated after the last of the 
summer cultivations (p. 401). This may have been either before or after the second 
breeding season, which usually begins for Apodemus towards the end of the summer. The 
mice caught in October and November were of both sexes and had a size distribution 
similar to that of the woodland population in these months. 

We have seen that in the fields an unexpectedly high proportion of the wide ranges 
observed were those of females. Their weights when first caught were distributed in 
expected frequencies over the usual autumn range (Hacker & Pearson, 1944, p. 159 and 
figs. 10-12). There was no indication that either the young or the mature females were 
pushing out in search of new homes, or that wandering was more prevalent in one month 
than in another. Nor did those males which ranged widely belong to any one age group, 
judging by their autumn weight. During the winter months in which most of our trapping 
took place travels were probably in search of food rather than of mates. In the woods the 
males were not detected ranging widely till the end of March, but this may have been 
because the food there was only then becoming scarce and have had no connexion with 
the onset of breeding. 


7-3. Survival 


In the park area about half the mice caught in the central hexagons in the autumn seem 
to have disappeared before the breeding season, and this, together with the overlap of 
ranges, suggests that little or no close inbreeding was likely to have occurred. This winter 
loss of population has been described for later years in a previous paper, where the monthly 
survival rate was calculated to be about 88 %, only 45 % of the mice surviving for 
six months (Hacker & Pearson, 1946). Already at the beginning of a trapping season it was 
doubtful, judging by the weights of the mice caught at any one hexagon, whether more 
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than one or two remained out of an original litter; even if there were more than one they 
may have been of the same sex. 

It is not possible to calculate a survival rate for the 1937-8 season. The traps were set 
at only one in every four hexagons in any week during the first 18 weeks of rouiine trap- 
ping, and this 200-yard spacing was not sufficiently close to ensure that all the mice were 
caught which were present on the area in that week. There was a gap of nine weeks before 
we returned to the same hexagon. Thus the mice were recaught at irregular intervals which 
were often longer than a month, and we could not determine the probable length of survival 
of a mouse which was never recaught. A rough assessment of survival in the park area may 
be made as follows: thirty-four mice (two dead) were first caught in central hexagons in 
the first rotation of traps during weeks 2, 5, 7, and 9; and eight others (one dead) in the 
second rotation during weeks 11, 14, 16 and 18. Of the thirty-nine mice released alive only 
nineteen were recaught in the prolonged trapping at all hexagons and at many intermediate 
points in weeks 23 and 25. Only two new mice were caught in central hexagons in these 
two weeks. . 

The disappearance of mice from the root field was even greater than that from the woods. 
Of mice caught at least once in this field, thirty-one (two dead) were first caught in weeks 
3, 6, 8 and 10, and twenty-one others in weeks 12, 15, 17 and 19. During the later in- 
tensive trapping in weeks 22 and 24, only thirteen out of the fifty mice released were 
recaught and there were no new mice. This may be partly because the folding of sheep at 
the southern end of the field prevented our setting traps at the two southern hexagons in 
weeks 22 and 24, but many mice were caught once only, or over only a short period, at the 
northern end of the field as well. Some may have been sporadic visitors from outside the 
grid, but predators, such as a pair of stoats actually seen there, probably accounted for 
most of them. 

Of the thirty-nine mice just referred to as caught and released in the centre of the park, 
eighteen were males, twenty were females and one was of undetermined sex. Of the nine- 
teen of these mice which survived in this area till the breeding season, seven were males 
and twelve were females. Of the fifty original root field mice released alive, twenty-four 
were males, twenty-four females, and two of undetermined sex. Here, of the thirteen known 
survivors only three were males while ten were females. We are unable to explain why more 
males disappeared in this season; in other years they were in excess when breeding started. 


7-4. Commencement of breeding 


Our March trappings in 1938 coincided with nearly three weeks of exceptionally warm, 
sunny weather and the breeding season began very early. Table 5 shows for each week of 
trapping the number of females found with an open vulva in the early months of this year. 
In week 25, at the end of March, nearly three-quarters of the females caught were in this 
condition and two of them gave birth to young before we could set them free. In later 


Table 5. Number of females with open vulva in the spring of 1938. 
For calendar of weeks see § 3-2 


P=park or western area. F = fields or eastern area. 
Week 14 15 16 17 18 19 22 23 24 25 30 
Area P F , F P F F P F P_ Both 
Total 29 13 13 il 13 11 9 20 13 19 29 ll 


Open vulva 0 0 0 2 1 1 2 6 7 21 11 
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years, out of a much larger number of females caught in March, we found very few with 
an open vulva. The only year in which we continued our routine trapping at Holwood later 
than March was 1939. Then we found in April an open vulva in all but one of the fifty-nine 
females caught in weeks 29 and 30, but there were no signs of advanced pregnancy in any 
of them, and no young were born in the traps and cages until May. 

Though we have no direct evidence on the extent of interbreeding, a study of the charts 
shows that the mice were still ranging widely when mating commenced. It is clear, there- 
fore, that there were ample opportunities for mating between mice coming from places at 
a considerable distance from each other. 


7:5. The need for more field work 


Much more information needs collecting, over a number of years, about the inter- 
relationships of mild weather, length of breeding season, density of population, abundance 
of food supply, the tendency of mice to wander, and the consequent chances of close 
inbreeding. 

More details are also wanted about the habits of individual mice. Is a male as well as 
a female associated with each nest in the breeding season? Are these nests new, and if so 
how far distant from the birthplace of each parent? Does this distance vary with the nature 
of the country and food supply? Does the food of the mice differ much from place to place 
or at different seasons? To answer these questions much more observational work is 
necessary, including examination of stomach contents. Trapping would have to be directed 
to locating the nests of individual mice at different times of year. 


8. SUMMARY 

1. The frequency with which mice are observed to travel different distances depends 
(a) on the spacing of trap sites, (b) on the shape and size of the area covered, (c) on whether 
traps are shifted from site to site or set on each site simultaneously so that mice are inter- 
cepted at a near site before they can reach a more distant one. 

2. For these reasons a frequency table of the distances observed under any rigid system 
of trapping may give an entirely false idea of the average range of mice. In any case an 
average calculated from such a table is likely to mislead, as the set of possible distances is 
discontinuous and irregular, and the number of times each distance occurs within the 
system is not the same. 

3. Comparison between the results of two different systems of trapping on a 100-yard 
grid in Holwood Park, one in 1937-8, the other in 1938-9, illustrates the artificial nature 
of the frequencies with which different ranges were observed (Table 1). In 1937-8 the 
maximum range most frequently observed was between two groups of traps 100 yards 
apart. In 1938-9 the majority of mice were never caught ranging beyond the limits of a 
single group. 

4. Besides the arbitrary set of distances imposed by a rigid trapping system there are 
biological factors which also affect the chance of observing long travels. Among these the 
most obvious are (a) the nature of the country, (b) the length of time a mouse survives. 


5. Apodemus was caught in very varied environments throughout Holwood Park and 
its neighbourhood, but only rarely in traps more than 100 yards out in the open grass areas. 
6. If from the 1937-8 records shown in Table 1 all mice are excluded (a) which were 
caught in the fields area, (b) which in the more wooded park area were caught only on the 








( 








me Fees 


1) 






























Y 


11 
* 
11,11,25 





~e 


18,18 (X) 
9 














Fal 


7,16 
2,11 \ 9355 








25,30  \30 
5,14 9,18 
J o 
25,30 23,25 
Re. 3 











Chart VI. Trapping scheme, showing weeks in which traps were set in hexagons 


on the grid. The many intermediate points at which traps were set in 
weeks 20, 21 and 23 are not shown 














B23 




















ag 

* hed 
W5,14, W911 
25 20,3 








Chart Ill (Q 


































C9,18 C2 22" E948 
* e 2 
B18,23 D5,14 iD 18 

Gux D2 
G 16, 23 Di11 E 16,25 
— . a 
G24t H27t 





E 
018, 25,30) Li42s_. / * 


N 18, 257M 18 M25 


i, inet 











Chart i (GG, shorter ranges) 














wy, ae an 


we 














Chart Ill (QQ, shorter ranges) Chart IV (QQ, longer ranges) 


WESTERN TRAPPING AREA 








Keston Common 





Holwood House 








a NN te 
SG RS Na 


> 


200 Yards 
j 











Woodland, bracken, scrub, shrubberies 











Map of Holwood Park and adjoining farmland, showing distribution of trapping s' 
























nhams..). 





“SS 


ees rare | 


8250.2 Le Wood. 00> 















NA: 





S 
\ 
eS 


my 
\ 
CE 
\B\ 
O*< >< 
we 
‘ 

SN A SRK AY 
uN 
a. 
VANE 


ee \NOy 
Q 
\ 
O 


a i a he 
~\ 





gy. \» \ 


O 


































ie «. —> 

8 3 “AA 
47 FACET, Fae 
Vy Ait Pte hd 
a Cg Tee @ 
"i ee FF i F 47 
Z Rage © 3 SZ 
/ OREO? Matta ee 
a Me id as Aa 
7 Rata mel Oe 
; CePt, 9 O 7 e 
Sige ta CLS 
Stes tl tls 4 
vO eT A ete 

gteed Cae, 

OLY Ale? 

Pat A dial oh 
ep FE d 

4/7 


















“\ Arable Bie Grassland, gardens, roads, etc. = Ponds 





in of trapping sites. Six traps were set round each circle. The sites used in 1938-39 are shown as double circles. 
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Chart XIll. Trapping scheme, showing weeks in which traps were set in hexagons 
on the grid. The many intermediate points at which traps were set in 
weeks 22 and 24 are not shown 
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periphery, and (c) which failed to survive till March, then the maximum range most 
frequently observed in that season rises for males to the distance between two groups of 
traps 224 yards apart, or a knight’s move on the 100-yard grid (Table 2). For females the 
mode is still 100 yards, but only one out of twenty-three remain of those confined to a 
single group of traps. 

7. These ranges agree fairly well with our records from South Haven Peninsula where 
no grid was used but where the trapping sites were chosen for differences in vegetation. 
There the distances between them were more or less random, varying from a few yards to 
over 3000 yards without simultaneous trapping. 


8. The two longest travels observed at Holwood were of 545 yards between two separate 
sections of the grid and 645 yards between one of these sections and a wood outside it. 


9. Some of tke longest travels at Holwood were across uninhabited grassland, but others 
were within the populous areas. We found no evidence that mice were deterred from 
wandering by the presence of neighbours, or that we ourselves increased their range by 
trapping and temporarily removing these neighbours. 

10. Among mice caught frequently, the proportion caught on a first night of trapping 
was high. This was probably because mice living near the traps usually found them on a 
first night. Mice caught less frequently and probably living at a distance from the traps 
were no more likely to find them on the first night than on any other night, and their chance 
of being caught would be reduced by their finding traps blocked by local mice. 

11. We found no evidence of migration to a distance, either on South Haven Peninsula, 
or in five successive seasons at Holwood. In 1939-40 there were nine survivors from the 
previous season. Of these, the six females were each caught at the same site as before; the 
three males were at quite new sites which, however, adjoined the earlier ones. 


12. The wandering observed in both sexes during the winter months of 1937-8 persisted 
after mating had started in the spring. By that time the surviving population, much 
reduced in numbers, was thinly distributed over the area, their ranges overlapping but not 
apparently coinciding. Thus the chances of close inbreeding appear to have been small. 
We found no evidence of isolated inbreeding communities, while there were ample 
opportunities for mating to occur between neighbours living 100 yards or more apart. 
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Samples with the same number in each stratum 


By W. L. STEVENS, Faculdade de Ciéncias Econémicas e Administrativas, 
Universidade de Sao Paulo, Brazil 


1. PRACTICE 


The problem discussed here arose during the initiation of a permanent sampling survey, for crop estima- 
tion and forecasting, by the Division of Rural Economy, Secretariat of Agriculture, State of Séo Paulo, 
Brazil. 

Suppose that the sample, in the present instance of farms, is to be stratified by counties. Then, if the 
within-county variance were assumed constant, it would be correct to use a uniform sampling fraction, 
so that the number of farms of a given county in the sample will be proportional to the number of farms in 
that county. 

When the field work is done by county officers, it may, however, be more convenient to sample the 
same numbey in each county. Among the advantages, we note: 

(a) the work is spread uniformly over the county officers and there will be no complaints from Mr A. 
that he has to do four or five times as much work, visiting farmers, as his colleague, Mr B., in a smaller 
county; 

(6) the printed form and instructions can be standardized; 

(c) the computation of estimates can more easily be reduced to a routine. 

Against these advantages it may be claimed that the computation, for a sample with a nominally 
constant sampling fraction, is simpler, because we have only to divide the total by the nominal sampling 
fraction. For a sample with constant numbers, we must, of course, use the true sampling fractions. At 
the same time, if the numbers in the strata are small, the true sampling fractions will depart considerably 
from the nominal in the constant fraction sample. If the short method of estimation is to be used, care is 
then needed in the technique of selection. 

Another line of argument leads us to the conclusion that it may be advantageous, not merely that the 
number should be constant, but that it should be two in each stratum or substratum. Granted that we 
want unbiased estimates of the standard errors, then we must draw at least two units at random from 
each stratum. On the other hand, the more elaborate the stratification and cross-stratification, the more 
accurate generally will be our estimates. From this we conclude that the number per stratum should be 
as small as possible and consequently exactly two. 

We observe also that when we have two units in each stratum, the computations are especially simple, 
being based solely on the sums and differences of the pairs of values. 

It is evident that the efficiency of the constant number sample will depend, to a large extent, on 7, the 
ratio of the largest to the smallest number of units in a stratum. As we have not encountered an appro- 
priate term, we suggest that the ratio of largest to the smallest value in a set be called the geometric range. 
When the geometric range of the numbers in the strata is unity, all numbers are equal, and the efficiency 
is 100 %. As r increases the efficiency falls off. 

From either an empirical or theoretical study (see below), we determine the upper limit to the geo- 
metric range, corresponding to the maximum loss of efficiency which we are disposed to tolerate. Even if 
the observed geometric range is above the limit, we can often bring it below by making a few adjustments. 
Large counties can be divided in two, the extra work being admitted and perhaps provided for by 
allocating additional staff. At the other end, neighbouring small counties can be united in pairs. Evi- 
dently we cannot permit a very extensive readjustment or we should fail to attain the simplicity and 
administrative convenience which we are seeking. 

If the sample constitutes only a small fraction of the population, the efficiency of the constant number 
sample is, to a close approximation, 

E = m*/m,, 


where m = average number of units per stratum and m, = second moment, about zero, of the frequency 
distribution of the number of units in the strata. More exact formulae are given later. 

In any given problem, these moments can be found arithmetically and the efficiency determined. What 
is more valuable, however, is a general indication of how the efficiency depends on the geometric range. 
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To obtain this, we must postulate a distribution for n, the number of units in a stratum. Three hypotheses 
will be considered: 

(a) a uniform distribution within the range; 

(6) a uniform distribution of log (n); 

(c) the distribution of n which is the most unfavourable possible, namely, one in which proportions 
r/(r+1) and 1/(r+1) are concentrated at the lower and upper limits respectively. 

The efficiencies, as functions of r, the geometric range, are tabled below: 


Efficiency (%) 
A. 





Geometric ~ 
range Uniform Worst 

r Uniform logarithm possible 
1 100 100 100 
2 96 96 89 
3 92 91 75 
4 89 87 64 
5 87 83 56 

10 82 71 33 

oo) 75 0 0 


In reality, the efficiency may well be greater even than that indicated in the first column. If very 
little or no adjustment is made to bring the distribution within the stipulated range, the variance will 
usually be less than that of a uniform distribution and the efficiency accordingly greater. Again, if the 
average number per stratum, in the sample, is small, the condition of a minimum of two units will mean 
that many of the smaller strata must be united, if we are to achieve a constant sampling fraction. This 
reduction in the intensity of stratification generally increases the error of estimation. 

In round numbers, the table suggests that even if the largest number of elements in a stratum is four 
times the smallest number, the loss of efficiency is only of the order of 10 %. This is surprisingly small. 
In many situations one would willingly pay this price, in order to simplify administration and routine or 
to spread the work uniformly. 

If we are aiming at an elaborate stratification and cross-stratification with the idea of drawing only 
two units from each substratum, we might have been embarrassed by the difficulty of arranging fairly 
constant numbers in the substrata. We now know that the attempt is unnecessary; wide variation in 
numbers may be accepted. Only in the extreme cases need we subdivide or unite substrata. 

Although we have supposed the sampling fraction to be constant, the argument can readily be general- 
ized for a variable sampling fraction. Here we are concerned, not with the variation of the numbers in the 
different strata, but with the variation of the products of these numbers by the respective sampling 
fractions. If these products lie within a range of four or five-fold, the sample will usually be satisfactory. 
When some products fall outside the permitted range, the corresponding strata or substrata should be 
adjusted by subdivision or merging. 


2. THEORY 
Let T = total number of strata, 
n = number of sampling units in a given stratum, 
N=%Xn = number of units in population, & denoting summation over the strata, 


m = N/T = average number of units in a stratum, 


k = number of units selected from each stratum for the constant number sample, 

x = the datum (area of farm planted to cotton, etc.), 

o? = variance of x within the stratum, defined with n — 1 as the denominator. (Note. We 
do not, at first, assume a? to be constant), 

f=k/n =sampling fraction in the given stratum, 


F=k/m = general sampling fraction (= sample/population), 
a = Ino*/T, 


B = Inte?/T. 
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First, let us consider a sample with the same sampling fraction, F, for all strata. Clearly, this is the 
nominal sampling fraction; the true sampling fractions will diverge somewhat from F’. The true value of 
the variance of the estimate will accordingly be higher than the value which we proceed to calculate: 


number of units of given stratum in sample = Fn (ideally), 
variance of estimate of mean of x = (1-—F) 0° /(Fn), 
variance of estimated total for the stratum = n(1—F) o?/F, (1) 
hence variance of estimated total for population = In(1—F) o?/F 
= Ta(1 — F)F. 


Secondly, let us consider a sample with the same number, k, of elements in each stratum: 
variance of mean of z, in given stratum = (1—f) 0?/k, 
variance of estimated total for the stratum = n(1—f) o7/k 
= (n*—kn) o?/k, 
hence variance of estimated total for the population = X(n? — kn) o?/k 
= T(B—ka) o*/k 
= T(B-—amF)/(mF). 


(2) 





The relative efficiency of the constant number sample, compared with the constant fraction sample, is 
now given as the ratio of expressions (1) and (2): 
am(1—F) 
E=— 3 
B-—amF (3) 





The true efficiency may be greater than this, owing to differences, noted earlier, between the nominal 
and real sampling fractions. 

Usually, we may assume that o? is constant. (Evidently, if 7? varied very much, we would not be 
thinking about using a sample with constant sampling fraction.) If a? is constant, then 


a=o°tn/T =a°N/T = o™m, 
B = o*in?/T = o*m,, 


where m, denotes the second moment, about zero, of the frequency distribution of n. The expression for 
the efficiency then reduces to ” m%(1—F) 


m,— Fm? 








(4) 


It is interesting to note, however, that this expression remains true, even if ¢? does vary between strata, 
provided that o? is not correlated with n. 


If we make the further assumption, that the sampling fraction, F,, is small, the efficiency becomes 
approximately 


E = m?/m,, (5) 


as was ussumed earlier. The same expression is exactly true, whatever the value of F, if sampling is 
done with replacement. 


The determination of the efficiencies for various theoretical distributions presents no difficulties. For 
example, a uniform distribution of n over the range 1...r, has a distribution function 


dn 


r-1° 


ndn n? r+l 
Hence n= = s-, 
r—1 | &(r—1) 2 


i n®dn _ n3 _f+rtl 
a Jr-1 L&r-)} 3 ° 


From expression (4) the efficiency is 














3(r + 1)*(1—F) 


E= 
4(r2 +74 1)—3(r+1)?F’ 
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while, by the approximate formula (5), it is 
_  8(r+1)? 
~ &(r2+-r41)° 
For r = 4and F' = 10 % these give 
E = Exact 88-2 %/Approximate 89-3 %. 





As we are using in Sao Paulo an overall sampling fraction far smaller than 10 %, the approximate 
formula is quite adequate for our purposes. 
Later, as data become available, we'shall determine the efficiencies from the empirical distributions. 


Approximations to the probability integral of the distribution of range 
By N. L. JOHNSON, University College, London 


1. The probability that the range of n independent continuous random variables, each with cumu- 
lative distribution function F(x), does not exceed w is 


P,(w) =n J * (F(x) — Fle—w)]"-"p(2) dz, () 


—-o 


where p(x) = dF(x)/dz. 


In this note approximate formulae for P,,(w) will be obtained. These approximations are useful only 
for low values of n and w, but they are of interest in that they indicate the form of the distribution of 
range near the origin (w = 0). It is possible that in specific cases the methods of Pillai (1948), based on the 
mathematical form of F(x), may give somewhat better results, but the generality of the present ap- 
proach seems to be of some interest. 


2. Expanding F(x—w) by Taylor’s series and retaining terms up to w', (1) becomes 
@ 
P,(w) = nw" | (p(x) — dupa) + gwtp (x) — Pawip (x) + reqwtp(x)}"—* p(x) de, (2) 
—o 
where p'\(x) = dp(x)/dx’. 


Expanding the integrand and retaining terms up to w* we find 
P,(w) = mwr-| [ps —n- 1yw [pr p™+(n—1) wA{5 fer p+ 3(n— 2){p* wy 
—(n— wl pr p® + dy(n— 2){ pnt p+ dn —2)(n—3) | pr-rp| 
+(n— 1) w'{ 33 | prt p® + dg(n — 2) pr-* py? p® + gg(n — 2) (n— 3) J pr-*(p)? p 


+ aba(n—2)(n—3)(n—4) [ pray, (3) 
where fr means “ [p(x)]"* da, and so on. 


—-@ 
Provided p, pp, etc., tend to zero at the extremes of the range of variation of x, (3) can be considerably 
simplified by integration by parts. This leads to 


: P,,(w) enw [ps +y(n+ 2) 0 | pt + ePaaln +4) wA{3 [ prt p+ 5(n— » {pre | 


ifn>4, (4:1) 


P,(w) = sul [p* +¥qut J pep + rhow'{20 p(p™)* — 3 [pel | (4-2) 
Pare ol [bof — wf ext! - 
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3. The approximation at (2) is likely to be good only for small values of w, while the further approxima- 
tion at (3) depends also on n not being too large. Thus the formulae will be expected to be good approxi- 
mations only for low values of n and w. An interesting feature is the fact that as w decreases and P,,(w) 
becomes very small, not only the absolute but also the proportional accuracy of the approximation in- 
creases. 

In order to obtain an appreciation of the accuracy of the approximation we may compare the values 
of P,,(w) for the unit normal distribution (p(x) = (1/,/(27)) e-#") given by (4) with the corresponding 
exact values (Pearson & Hartley, 1942). 

The approximate formulae in these cases are 


Pat) dn( Tea)" [1 Pat? 4 Ont | ifn>4, (51) 





24n 5760n?2 
. w > 4 
Pw) +4375) [1—gs w+ gig], (5-2) 
Pw) +2(75— [l—pew? + 735 wv"). (5-3) 


Specimen values are given in Table 1 A, exact values being shown in brackets. The mode of formation of 
the coefficients in (5) suggests that the formula 


meee *(Zen) ce (6) 


might be useful for somewhat higher values of n. Table 1B shows that this supposition is justified for 
n = 8-15. Neither (5) nor (6) gives good results for values of w much in excess of two. 


Table 1A 
an 2 3 4 5 6 7 8 9 
w 
0-5 0:2763 0:0666 0-0152 0-0033 0-0007 0-0001 — as 
(0-2763)  (0-0666) (0-0152) (0-0033) (0-0007)  (0-0002) — — 
1-0 0-5207 0-2410 0-1059 0-0452 0-0190 0-0078 0-0032 0-0013 
(0-5205) (0-2407) (0-1057) (0-0450) (0-0188) (0:0078) (0-0032) (0-0013) 
1-5 0:7144 0-4671 0-2941 0-1819 0-1118 00687 0-0425 0-0265 
(0-7112) (00-4614) (0-2865) (0-1733) (0-1031) (0-0606) (0-0353)  (0-0204) 
2-0 0-8651 0:7188 0-6011 — -— —_— _— —_ 
(0°8427)  (0-6665) (0-5096) — — — co -— 
Table 1B 
n s 9 10 ll 12 13 14 15 
w 
1-5 0-0367 0-0212 0-0122 0-0070 0-0040 0-0022 0-0013 0-0007 
(00353)  (0-0204) (00-0117) (0-0067) (0-0038) (0-0022) (0-0012) (0-0007) 
2-0 0-1535 0-1100 0-0783 0-0554 0-0391 0-0275 0-0193 0-0135 
(0-1489) (0-1072) (0-0768) (0-0548) (0-0389) (0-0276) (0-0195) (0-0137) 
2-5 0°3457 0-2819 0-2284 0-1841 0-1478 0-1183 0-0944 0-0751 


(0-3579) (0-2964) (02443) (0-2007) (0-1644) (0-1342) (01094) (0-0890) 


4. Forsmall values of « we can use the first terms of (5) to determine approximate values of the signifi- 
cance limit w, satisfying P,,(w,) = a. In the unit normal case, if « is small enough the simple formula 


we = J (2m) (a//n)Mn-9 (7) 
will suffice. From Table 2 it will be seen that (7) gives quite useful results for n < 5, a < 0-025. The more 
accurate formula , 2 nein 

we? = wal Sa eee et ut (8) 


is useful over a somewhat wider range (see Table 2). 


ern Se 
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Table 2. (i) wk, (ii) wE*, (iii) we (exact value) 


100« 0-1 0-5 1-0 2-5 5-0 10-0 Formula for w¥ 

n 

2 (i) 0-00 0-01 0-02 0-04 0-09 0-18 1-77245a 
(iii) 0-00 0-01 0-02 0-04 0-09 6-18 

3 (i) 0-06 0-13 0-19 0-30 0-43 0-60 1-90463at 
(iii) 0-06 0-13 0-19 0-30 0-43 0-62 

4 (i) 0-20 0:34 0-43 0-58 0-73 0-92 1-98951lat 
(iii) 0-20 0-34 0-43 0-59 0-76 0-98 

5 (i) 0-37 0°55 0-65 0-82 0-97 1-15 2-04983at 
(iii) 0-37 0-55 0-66 0-85 1-03 1-26 

6 (i) 0-53 0-73 0-83 1-00 1-15 1-32 2-09544at 
(ii) — = 0°87 1-07 1-26 1-51 
(iii) 0-54 0°75 0-87 1-06 1-25 1-49 

7 (i) 0-67 0-88 0-99 1-15 1-29 1-45 2-13140« 
(ii) 0-69 0-92 1-05 1-26 1-47 1:75 
(iii) 0-69 0-92 1-05 1-25 1-44 1-68 


5. Formulae (3) and (4) are of general application, and we conclude this note by giving formulae 
appropriate to two non-normal distributions: 


. 1 -2£ P > 
{i) p(x) = tern’ (O0<z) (Pearson Type ITI) 
" . T(nqg+)) oi (n—1)(n+2) ss 
F n(w) nm T(g+1))" bh)” [ - ~24(ng—1) | (valid for q> 2). 
(ii) p(x) = ea a —x)* (0<a<1) (Pearson Type I) 


B(r+1,s+1) 
Binr+1,ne+1) , in— Ding Be tednar tes int te 3) a] 
(B(r+1,s+1)]" 24n(nr —1)(ns—1) 
(valid for r >2, s>2). 


P,(w)=n 
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Statistical control of counting experiments 


By H. 0. LANCASTER 
School of Public Health and Tropical Medicine, Sydney, Australia 


1. Introductory. In counting experiments with n replicate counts from a Poisson population with 
parameter, m, the commonly used test of consistency of the counts within the set is due to R. A. Fisher 


n 
(Fisher, Thornton & Mackenzie, 1922), who pointed out that & (x;—2Z)*/Z is distributed approximately 


(ms 
as x? with (n—1) degrees of freedom. The use of Stirling’s approximations in the demonstration 
causes some doubt as to whether the test may be used when Z is small. Sukhatme (1938) reported 
a random sampling experiment suggesting that with large n, say greater than 10 or 15, it was sufficient 
for the mean to be greater than 1. Cochran (1936) used the method of complete enumeration with 
%=2 and n=4 and expressed dissatisfaction with the y? approximation, although his example is 
surely over-specialized. Neyman & Pearson (1931) and Shanawany (1936) considered a similar problem 
of the distribution of the discrete xy? in the multinomial (0-2+0-3+0-5), and were agreeably 
surprised at the accuracy of the approximation. ‘ 

There is need for some further investigation of the distribution of the discrete y* when the number 
of replications is small or when the observed mean is small. 
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2. A random sampling experiment. From each of four Poisson populations, with means 30, 15, 
10 and 5 respectively, 200 sets of four random samples were drawn. These means were chosen to imitate 
the lowest levels in haematological counting, so that the methods of statistical control in haematology 
(Lancaster, 1950) might be justified. The results are detailed in Table 1; they have been classified 
according to the value of P(x?) which for 3 degrees of freedom is given by 


oe 
P(x?) = constant xf (y2)1-1 ex" dy? 
x’ 


The results suggest that there can be no objection to using the methods with four parallel counts 
when the expected mean is as low as five. 


Table 1. A random sampling experiment, with 200 sets of 4 counts, simulating red cell counting 


Observed frequencies, according to value 





of the expectation m Expected 
Probability classes r A . frequencies 
defined by P(x?) m= 5 10 15 30 on x” theory 
1-0 to 0-9 24 24 21 20 20 
0-9 to 0-7 41 38 39 38 40 
0-7 to 0-5 38 34 47 39 40 
0-5 to 0:3 36 34 37 44 40 
0-3 to 0-1 41 49 39 33 40 
0-1 to 0-0 20 21 17 26 20 
Total 200 200 200 200 200 


3. Statistical control with sets of two counts. With duplicate counts, the effects of discontinuity are 
marked; for, with a low set total, S = 2,+,, there can be but few values of their difference, d, and so 
but few values of y?, which takes the simple form (x, —2,)?/S = d?/S. This problem was considered by 
Przyborowski & Wilenski (1939), who gave a table of what may be termed nominal significance levels 
for x, (20, 10, 5 and 1 % for a double-tail test or half these figures for a single-tail test) for all values 
of S between 1 and 80.* They also showed how much the chance of rejecting the hypothesis of a common 
expectation, m, for the two counts, when it is true, falls below the nominal levels owing to the effect 
of discontinuity. Thus if the 5% levels given in their table are used to determine whether x, and 2, 
differ significantly, in duplicate counts from Poisson populations, when there is a common mean, 
m = 5, in the long run of sampling only 2-2 % of the pairs would be rejected in place of the nominal 5 %. 
This figure is only increased to 3-7 % if the Poisson population has a mean of m = 25. In fact, when 
m is in the neighbourhood of 5, the nominal 10 % levels given by Przyborowski & Wilenski correspond 
more nearly in the long run to true 5 % level. 

No better control would be obtained by using x? corrected for continuity (Yates, 1934). An acceptable 
solution from the theoretical point of view would be to use random sampling numbers after the manner 
suggested by Tocher (1950) or Stevens (1950) and so obtain an expectation equal to the theoretical 
for every value of S. This is an unsatisfactory solution in practice, however, as it involves computing 
terms of the symmetrical binomial for every different value of S that occurs. A simple solution is to 
use the uncorrected or crude x* which gives approximately correct results in the long run. Investiga- 
tions carried out by the author have shown that the tabled probability P(x") corresponding to the 
crude x* = d?/S approximates closely to the median probability 


HP(d|S)+P(d+1|S)} 
discussed by the author in an earlier paper (Lancaster, 1949). 


4. Sets of three counts. With sets of three counts, it has been practicable to compute the relative 
frequencies for all possible configurations for set totals, S = 2,+2_+2, from 6 to 19. To test whether 
statistical control would be justified in this range, I have supposed that sets of three counts have each 
arisen from a Poisson population, with parameter m = 5, and also from one with parameter m =4;. 
Each set has then been assigned by the use of x* to one of the six P(x*) probability classes, 0 to 0-1, 
0-1 (0-2) to 0-9, 0-9 to 1-0. With the first of these populations the expected frequencies of assignment 
are 9-9, 19-5, 21-4, 20-9, 16-3 and 12-0%, and with the second 10-1, 20-1, 21-8, 19-1, 17-3 and 11-6 %, 


* [This table, with the addition of a column giving 2 % levels for the double-tail test will be included 
in the new Biometrika Tables for Statisticians. Ed.] 
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instead of 10, 20, 20, 20, 20 and 10%. Any other Poisson population could be similarly tested, but it 
seems safe to generalize from these results and to state that statistical control may be carried out with 
a set total S > 6 provided that an undue number of counts with set total at this level are not introduced. 
For counts with set total above 19, there appears no reason for doubting that the agreement would be 
even better. In practice, as in haematology or bacteriology, such a distribution of count size would 
hold that only a small proportion of such counts would have a mean less than 5 or 6. So all these 
could be admitted to the statistical control records. 


5. Sets of four or five counts. Similar reasoning may be applied to sets of four or five counts. For 
n = 4, I have enumerated all cases for S of 6 to 16. For S>10 and n= 4 or 5, it appears that the 
nominal and effective frequencies of assignment would closely approximate, although I have not 
enumerated the cases with n = 5 beyond 10. The test of the agreement between the effective and 
nominal frequencies is to suppose that the counts come from some Poisson population and to enumerate 
the frequencies of assignment for each S as before. 


6. The effect of varying the number of replications with fixed set total. To illustrate the computations 
and to try to assess the effect of increasing the number of counts in the set with a fixed set total, a set 
total of S= 16 was chosen as being neither too small nor offering insuperable difficulties with com- 
putation. Every possible configuration was formed and the corresponding value of x? and frequency 
of occurrence computed for sets of two to eight counts (n = 2 to 8). It is apparent that there are too 
few configurations possible when n= 2 to expect any close approximation between effective and 
nominal frequencies in the probability classes. With increasing n, the number of configurations 
increases, being the number of partitions of S with at most n parts. But the greatest increase in the 
number of configurations takes place in the probability class, 0 to 0-1, and many of these configurations 
occur with excessive rarity. Table 2 shows that in the probability range, 0-1 to 1-0, there are 4 con- 
figurations with n = 2, 9 with n = 3, 20 with n = 4 and 31 with n = 5. In this range, a comparatively 
small increase of configurations occurs with further increase in n. Some cases of marked disagreement 
between nominal and effective frequencies of assignment may be found for each n. Thus with n = 7, 
only 9-95% of sets are assignable to the probability class 0-3 to 0-5 instead of the nominal 20%. 
Sukhatme’s (1938) explanation of his random sampling experiment can be made a little more precise. 
The close agreement between nominal and effective frequencies in his random sampling experiments 
was due to his being concerned with many different values of S. S, in fact, was a Poisson variable. 
But similar results would be obtained if S were given any distribution not arbitrarily specialized after 
the manner of Cochran’s (1936) example. 


Table 2. The effect of varying the number of replications on the frequency of assignment of 
n 


i= 


x= >Y (x,-—Z)*/z to six P(x*) probability classes, when the set total is 16 (i.e. > x; = 16) 
1 i 





No. of 
counts The frequency of assignment (as a percentage) to the probability classes Total 
in the r i .  configura- 
set (n) 0 to O-1 0-1 to 0-3 0-3 to 0-5 0:5 to 0-7 0-7t0o09 0-9to 1-0 tions 
2 7-68 (5) 13°33 (1) 24-44 (1) 34-91 (1) 0 (0) 19-64 (1) (9) 
3 10-93 (21) 13-47 (3) 29-72 (3) 20-09 (1) 11-72 (1) 14-06 (1) (30) 
4 10°51 (44) 15-50 (9) 27-89 (5) 24-90 (3) 5-64 (1) 15-56 (2) (64) 
5 8-54 (70) 18-68 (15) 29-74 (9) 7°94 (1) 22-98 (4) 12°12 (2) (101) 
6 9-61 (98) 19-83 (17) 17-95 (7) 22-07 (7) 21-38 (4) 9-16 (3) (136) 
7 8-87 (119) 29-95 (26) 9-95 (3) 23-66 (9) 21-04 (4) 6-54 (3) (164) 
S 7:26 (134)  20:19(23) 20-45 (12) = 25*32(7) —- 2025 (7) 6-53 (3) (186) 
9 (149) (25) (10) (5) (7) (5) (201) 
10 (162) (19) (13) (8) (4) (6) (212) 
11 (169) (19) (12) (9) (4) (6) (219) 


The number of configurations assignable to each class is given in parentheses after the percentage 
frequency. For n = 9, 10 and 11 the frequencies have not been computed. 


7. The goodness of approximation. of the discrete x* to the continuous x? distribution. The use of the 
X? distribution is usually justified by Stirling’s approximation. It seems better, however, to define 
¢? = X(x;—%)?/% and then show that ¢? has a mean and variance close to those of the standard x? 
with (n— 1) degrees of freedom and that one may use this distribution as an approximation to obtain 
the probabilities. Haldane (1937, p. 136) showed that in the present problem the expectation of x? 
is (n— 1) and its variance is 2(n — 1) (1 — S-!), so that the variance of the discrete ¥? is slightly less than 
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that of the theoretical. One might expect a deficiency of observed numbers of large values of x? in 
the multinomial. This appears to be the case. In other words, the discrete x* is conservative and 
tends to over-estimate the probability when the x? is high, say at the 1% level, when S is not large. 


8. Discussion and summary. The practical importance of the technique of statistical control of 
counting is so great that it has been worth while to examine in detail the effects of small expectations 
on the frequency of assignment of the sets of counts to the probability classes, under the hypothesis 
that the conditions of counting are ideal. It has appeared that no great departures from the nominal 
frequencies in a commonly used group of probability classes is likely to arise in practice. 

It seems unlikely that x? will be supplanted by other methods as the test of consistency of counts 
in a set. The use of the random sampling numbers technique of Tocher (1950) and Stevens (1950) 
would be quite impracticable for sets of more than two or three counts. 

x? is likely to remain the method of choice in statistical control of counting, regardless of the size 
of the sample. 


This paper is published with the approval of the Director-General of Health, Dr A. J. Metcalfe. 
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Upper 5 and 1 % points of the maximum F-ratio 
By H. A. DAVID, University College, London 


1. The ratio of the largest. to the smallest in a set of k independent mean squares, s? (¢ = 1, 2,...,k), 
all based on the same number of degrees of freedom, v, was introduced by Hartley (1950) as a short-cut 
criterion in testing for heterogeneity of variance. Assuming normally distributed variables, he gave 
a table of the upper 5% points of what he termed Fy, = 824;,/824,, to which the corresponding 
ratio, calculated from the data, can immediately be referred. 

The tabular values were exact for k = 2 and for v = 2,00. For other k and v, F,,, at the upper 
100a % point was first calculated from the approximate relation 


Froax. = exp {wy(X)/[2/(v— 1)]}, (1) 


where w,(a) is the upper 100a % point of the range, w,, in k independent unit normal variates. Finally, 
Hartley used the two sets of exact results, k = 2 and v = 2, to modify this approximation Fj, by the 
adjustment i F 

Pie (adjusted) = Fyys/1 +4) QX); (2) 


with g, and q, fitted to the exact values at k = 2 and at v = 2. 
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At Dr Hartley’s suggestion upper 5 and 1 % points of F,,,, are here calculated by exact quadrature 
methods, and it is found that his approximations tend, in general, to underestimate these percentage 
points. A corrected table of 5 % points and a new table of 1 % points are appended. 


2. Since F,,,, is the ratio of the largest to the smallest of k independent values of x? (each based on pv 
degrees of freedom), it follows that its probability integral is given by 


I(F) = Pr.(Frus,<F) = Ef” me) [P( Fx) — P(x)}*~* dz, (3) 
0 


where p(x) = iv-le-iz (x>0), 


1 
2hT (ay) 
Fx 
and PRs) = p(x) da. 
0 


Now P(Fx) quickly approaches unity as x increases. The interval of integration may therefore con- 
veniently be split at a point x9, such that 1—P(F2x,)<¢, where € can be taken as zero to sufficient 
accuracy. Thus 


I(F)= kf me) [P( Fx) — P(x)}** da + fn) [1 — P(x)]*-*'dx 
0 Xo 


™ kf pte) [P(F x) — P(x)]* dx + [1 — P(x,)]}* 
0 


= I,+J, (say). (4) 


With suitable intervals for z and appropriate values of F, both P(x) and P(Fx) can in most cases be 
read off without interpolation from tables of the probability integral of y? (Hartley & Pearson, 1950; for 
even vr, also Molina, 1942). Furthermore, we have 


2p(x3) = P(xe_2) — P(x?) (5) 
so that I, (and hence J) may be determined by a suitable integration formula. 


3. To calculate F,,,,, at both the 5 and the 1 % point for a given (v,k) we proceeded as follows: 

By means of the tabulated approximate 5 % point and a rough 1 % point obtained from (1), two 
convenient limits of F were chosen, one to lie just below the exact 5 % point, the other just above the 
exact 1 % point. A further two or three intermediate F'-values were then sclected. We therefore arrived 
by means of (4) at four or five values of I(F'), enclosing the range 0-95—0-99. These four or five values were 
too few to allow the roots, F, of I(F) = 0-95, 0-99 


to be obtained directly, by inverse interpolation. Use was therefore meade of the above approximation to 
the probability integral Pr. (F,,,</) by the probability integral of range in normal samples of k 
(Pearson & Hartley, 1942), namely, writing I’(F) for this approximation to I(F), 


I'(F) = Pr. (w,<w), (6) 
where w = J[4(v—1)Jlog F. (1’) 


Differences AI = I’(F) —1I(#’) were formed for each of the above F’-values and attached to the arguments 
P=I1(F). Direct interpolation in AJ(P) at (irregularly spaced) arguments P was possible, interpolates 
AI(0-95), AZ(0-99) were obtained and hence corresponding values of I’(F), viz. 0-95+AI(0-95) and 
0-99 + AI(0-99), could be formed. Using these the required percentage points were found by inverse 
interpolation in the range tables. 


4. By the foregoing method a framework of exact F-values was constructed for all (v,k) given by 
v = 3, 4,6, lZandk = 2, 4, 6, 8, 10, 12. Fora fixed v, overlapping F’-vaiues were used as much as possible. 
The integrands of J, for given (v, F’), were built up successively for increasing k. 

The remaining F,,,, entries were obtained by interpolation in the framework extended by the addi- 
tion of the known exact results. Again, not F,,,, itself but dF = Fys,—Fingx, Was used as the basic 
function. Interpolation v-wise was harmonic on éF/F,,,, and interpolation k-wise direct on dF. 


I am indebted to Dr H. O. Hartley for advice in planning the computations and to Miss Joyce M. May 
for carrying out many of the quadratures involved. 
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Upper percentage points of the ratio 82.4; /82i.. in a set of k mean squares, 
each based on v degrees of freedom. (Normal variation assumed) 


(a) 5% points 








Nk 
\ 2 3 4 5 6 7 8 9 10 11 12 
aN 


2 39-0 87°5 142 202 266 333 403 475 550 626 704 
3 15-4 27°8 39-2 50°7 62-0 72-9 83°5 93-9 104 114 124 
4 9-60 | 15-5 20°6 25-2 29-5 33-6 37°5 41-1 44-6 48-0 51-4 
5 7-15 | 10-8 13-7 16-3 18-7 20-8 22-9 24-7 26°5 28-2 29-9 








6 5°82 8-38 10-4 12:1 13-7 15-0 16-3 17-5 18-6 19-7 20-7 
7 | 499 6-94 8-44 9-70 10-8 11-8 12-7 13-5 14-3 15-1 15°8 
8 4-43 6-00 ,7:18 8-12 9-03 9-78 10-5 11-1 11-7 12-2 12-7 
9 4-03 5°34} 6-31 7-11 7°80 8-41 8-95 9-45 9-91 10-3 10-7 
10 3°72 4°85 5-67 6-34 6-92 7-42 7°87 8-28 8-66 9-01 9-34 
12 3°28 4-16 4-79 5°30 5-72 6-09 6-42 6-72 7-00 7°25 7-48 
15 2-86 3-54 4-01 4°37 4-68 4-95 5-19 5-40 5-59 5:77 5-93 
20 2-46 2-95 3°29 3°54 3°76 3°94 4:10 4-24 4:37 4:49 4-59 
30 2-07 2-40 2-61 2-78 2-91 3-02 3°12 3°21 3°29 3°36 3°39 
60 1-67 1-85 1-96 2-04 2-11 2-17 2-22 2-26 2-30 2°33 2°36 
oo 1-00 1-00 1-00 1-00 1-00 1-00 1-00 1-00 1-00 1-00 1-00 

































































| 
5 ‘“ 2 3 4 es “re 7 8 9 10 ll | 12 
2 | 199 448 | 729 | 1036 | 1362 | 1705 | 2063 | 2432 | 2813 | 3204 | 3605 
3 47°5 85 | 120 151 184 | 21(6) | 24(9) | 28(1) | 31(0) | 33(7) | 36(1) 
4 23-2 37 49 59 69 | 79 89 97 106 113 120 
5 | 149 | 22 28 33 38 42 46 50 54 57 60 
| | 
| 
6 | Ile) | 15:5 | 19-1 22 | 26 27 30 32 34 36 37 
7 | 889] 121] 145| 165] 184) 20 22 23 24 26 27 
8 | 750! 99] 11-7 13-2 14:5 15°8 16-9 17-9 18-9 198 21 
9 | 654] 8&5 9-9 11-1 12-1 13-1 13-9 14:7 15°3 16-0 16-6 
10 | 5-85 | 7-4 8-6 9-6 10-4 111 11-8 12-4 12-9 13-4 13-9 
12 | 491! 61 6-9 7-6 8-2 8°7 9-1 9-5 9-9 10-2 10-6 
15 4:07 4-9 55 6-0 6-4 6-7 71 71:3 15 7-8 8-0 
20 | 3-32 3-8 4:3 4-6 4-9 5-1 5:3 55 5-6 5-8 5-9 
30 | 2-63 3-0 3-3 3-4 3-6 3°7 3-8 3-9 4-0 4-1 4-2 
60 | 1-96 2-2 2-3 2-4 2-4 2+5 2-5 2-6 2-6 2-7 2-7 
Oo | 1-00 1-0 1-0 1-0 1-0 1-0 1-0 1-0 1-0 1-0 1-0 
































Values in the column k = 2 and in the rows vy = 2 and o are exact. Elsewhere the third digit may 
be in error by a few units for the 5 % points and several units for the 1 % points. The third digit figures 
in brackets for vy = 3 are the most uncertain. 
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The conditions under which Gram-Charlier and Edgeworth curves 
are positive definite and unimodal 


By D. E. BARTON anp K. E. DENNIS, 
Department of Statistics, University College, London 


It happens not infrequently that the moments of the probability distribution function (p.d.f.) of a 
random variable z can be determined but not the p.d.f. itself. When the measures of shape, f, and /,, 
are known a curve of the form l 
f(x) = p,(z)—— e™, 
(2m) 


chosen to have the same first four moments as the p.d.f. of z, is often taken to represent it, where 
n 
Px) = 1+ z= c,H,(2x) 
r= 


is an nth order polynomial in z expressed as a sum of constant multiples of Hermite polynomials, H,(z). 

When z is in standardized form, the values 

4 =O=c, c,=Vf,/3!, ce = (f,—3)/4!, 
are taken with either c, = 0 for r>5 
or c,=0, c=%33, c =0 forr>7. 
We shall refer to these two cases as the Gram-Charlier and the Edgeworth series respectively. 

Two disadvantages of these curves in representing probability distributions are: (a) that for (£,, 8.) 
distant from the normal values (0, 3) we have f(z) negative for some part of the range of x and, (b) that the 
curves may be multimodal. The purpose of this note is to find the region in the (f,, 82) plane for which 
f(x) is positive definite and unimodal. We are thus extending the work of Shenton (1951). 

Now f(x) is positive-definite if n 
P(x) = 1+ = c,-H,(x)>0 

rT 


for all x. Regarding (c,, ...,c,) as co-ordinates of a point P in n-dimensioned space, this condition implies 
that P lies on the same side as (0, ...,0) of the hyperplane 


n 
1+ > c,H,(z) =0 
r=1 


for all n, that is, that it lies within the envelope generated by these planes as x goes from —0o to +0. 
This envelope has equation given parametrically by 


n n 
1+ DcHA(r)=0= D> ¢,rH,_,(2), 
r=1 r=1 


and explicitly by the discriminant of this—a (2n — 2)th-order polynomial in the c’s which may be found 
in determinantal form by the dialytic method of Sylvester. 

Since we are only interested in the branches of the surface represented which form a closed curve or 
surface round (0,...,0), we shall use the discriminant to decide on these branches and the parametric 
form to draw them. 

Turning to the unimodal problem in the two cases we consider 


df (x) I” 


dx 





This has an odd, finite number of real roots since they are the roots of the odd order polynomial with real 


coefficients: " 
H,+ 2 Healt) = 0 
r= 


(by virtue of the differential relation governing Hermite polynomials). The region in which f(x) is 
unimodal is therefore that bounded by the curve or surface in which f(x) passes from one to three modes 
and thus f(x) has a point of inflexion. It is given parametrically by 


n n 
A,(x)+ aa cp H,,,(x) = 0 = H,(x) + ~ Cr Hy,.9(X), 
re . r= 


and explicitly by the discriminant as before. 
Biometrika 39 28 
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f,, B, plane showing regions of unimodal curves and regions of curves composed entirely of non-negative ordinates. 
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For the curves particularly considered here, the Gram-Charlier and the Edgeworth series involving 
only the first four moments, we may consider the (f,, 8.) space in place of that of the c’s. The regions of 
positivity and unimodality are shown in the accompanying diagram for each of the two types of curves. 
(The outer boundary is that given by Shenton (1951, Table 1 and Fig. 1).) It will be noticed from the 
diagram that the Gram-Charlier and Edgeworth series having as parameters the first four standardized 
moments only of the p.d.f. of a random variable, are themselves true p.d.f.’s over only a small portion of 
the (f,, 8.) plane. The positivity region for the Edgeworth curves lies almost entirely within that of the 
Gram-Charlier and is less than it. This was expected because the Edgeworth series is more restrictiveon 
the parameters than the Gram-Charlier. The regions for which the curves are unimodal lie inside the 
respective positivity regions. Again we should expect this on general grounds, and we should also expect 
the Edgeworth region to be less than the Gram-Charlier. 

The boundaries drawn in the diagram define, of course, certain mathematical properties. It is quite 
possible that a curve with only one mode may nevertheless have a ‘bump’ on its side, between points of 
inflexion, which makes it unlikely to be useful in graduation. On the other hand, a curve with its (£, 2) 
point outside the region of positiveness may develop negative ordinates so far out in the tail that the 
defect is of no consequence. 

It is of interest to note the relation of these regions to those of the Karl Pearson and the Johnson (1949) 
systems. Some of the points for which the Gram-Charlier is positive-definite lie in the Pearson Type VI 
area (which lies between the Type III and V lines), but the regions for which both the Edgeworth and 
Gram-Charlier curves are both unimodal and positive-definite lie almost wholly in the Type IV area, 
i.e. ‘below’ the Type V line. These same regions fall whoily within the S, area of the Johnson system, 
i.e. ‘below’ the log-normal line. A considerable amount of computation has been carried out comparing 
these distributions which will be summarized in a subsequent note. 
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Comparison of analysis of variance power functions in the parametric 
and random models 


By N. L. JOHNSON, University College, London 


1. Two alternative forms of theoretical model are commonly accepted as a basis for the tests em- 
ployed in the analysis of variance. In the case of data arranged in k groups of n observations each 
(represented by the symbols x,;¢ = 1, ...,k;i = 1, ..., m), these models are: 

I. Parametric model: Cy = A+B +243 (1) 

II. Random model: Ly = A+U+Zy. (2) 

A is a parameter; the B’s are also parameters, satisfying the condition XB, = 0. The w’s and z’s are 
mutually independent random variables, each with expected value equal to zero. The u’s have a common 
standard deviation 0’; the z’s have a common standard deviation o. It will be further assumed that each 
u and each z is normally distributed. 

The ratio _ ‘between-groups’ mean square 





~ ‘within-groups’ mean square 


is used to test the hypothesis B, = B, = ... = B, = 0 if model I is deemed appropriate, or the hypothesis 
o’ = Oif model IT is selected. The hypothesis tested is formally rejected at the 100a % level of significance 
if R> PF, ,v.0, Where ¥, = k—1, ve = k(n—1), and F, is the upper 100 % point of the F,, ,, distri- 
bution. 


2. In model I it can be shown (Tang, 1938) that the power of this test with respect to a particular set of 
B’s is a function of (=B})/o*. 

In model II it can be shown (Johnson, 1948) that the power of the test with respect to a particular 
value of o’ is a function of o”?/c?. 


'Ygra 
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Patnaik (1949) has compared the two power functions in a few special cases. In order to carry out this 
comparison it was necessary to decide upon comparable values of DBi and o”?. The quantity S* = (ZB;) [k 
may be regarded as a measure of the variance between the (fixed) group means in model I, end Patnaik 
decided to regard hypotheses in models I and II as comparable if S? = 0”. rat” 

In the cases he considered, Patnaik found that, for such comparable hypotheses, the power function in 
the parametric model, /;, say, was always greater than the power function, £y say, in the random model. 
He adds: ‘We believe this result to be true in general and on intuitional grounds it might be expected. 


3. It is the purpose of this note to give a general, though somewhat approximative, appreciation of the 

comparative values of /,(A,) and £y(A,), where 
A, = knS?/o? and A, = kno’?/o?. 

An indirect method of approach will be used. Instead of comparing the values of £, and fy for equal 
values of A, and A,, the values A,(f), A,(f) for which £; = 8, Ay, = / respectively will be compared. Since 
the f’s are increasing functions of the corresponding A’s, the existence of a f (>a) for which A,(2)>A.(P) 
implies that for some A, = A, = A, £,< fy, and conversely. It will, in fact, be convenient to introduce the 
new parameters 

0, =A,/v, =A,/(kK—-1), Og = Ag/(¥, +1) = AQ/k, 
and to compare the values 0,(f), 9,(8) for which £, = 8, By = 2 respectively. The comparison of Ai(P) 
and A,(f) is, of course, easily derived from that of 0,{%) and 0,(/). ; ; 

It may be noted that hypotheses for which 0, = 0, might be regarded as comparable. For such pairs of 


hypotheses 0’? = (ZB?) /(k— 1), which is a plausible measure of variance between the (fixed) group means 
alternative to S?. 


4. For model I, Patnaik obtained the approximate formula 
BYAy) = Pr. {Fy yy> Pyyrg all +A) (3) 
where v = (Vv, +A,)?/(v, + 2A)). 


This approximation is good, and probably sufficient for the present general appraisal, even for small 
values of y,. 


For model Ll we have exactly 


BulAs) = Pr-{F yng > Frysvgeal! + Aa: + D))-}- (4) 
From (3) 1+9(B) = Fy vo /v.va9f? (5) 
and from (4) 1+6(8) = Fy val P vy v—08° (6) 
whence (14+6,(B))/(1+O02)) = Pry voe Pv.vare (7) 
Hence, to our degree of approximation 9,(f) 2 6,(A), 
according as Ftc eet 
Now v= v,(14+6,)?/(1+26,)>y, (since 4,>0), 


and from a study of tables of significance limits of the F-distribution we find that 

(2) Fy vg e <P v.vy6 for f > 0-50; 

(b) Fy. ve 8 > Fv. v9,8 for 0-001 <2 <0-10 and »,>3; 

(c) when f = 0-20, Fy, 114,5> Fy, for ¥92 15 while F,,,9,@ has a maximum at about v = 2 or 3 for 
3<»,< 15; 

(d) when £ = 0-25, F,,,19, has a maximum at v = 2 for v,> 13, and has a maximum at about v, = 3-5 
for 4<y,< 13. 

In cases (c) and (d) F,.yg,¢ decreases ax v increaxes above the value of v for which F,, vp #8 @ Maximum. 

From this we deduce that 

(i) 0,(2) <0,(f) if B>0-50; 

(ii) 6,(2) > 4,(f) for relatively small departures from the null hypothesis; 

(iii) the power curves for 7, = 8, will usually cross between £ = 0-20 and:# = 0-50. The smaller v,, 
i.e. the fewer the number of groups, the lower the value of # at the crossing point. 


5. Since A,/A, = (0,/0,) (v,/(v, + 1)), 


(i) above indicates that Patnaik’s conjecture that /,>, for A, = A, = A is certainly true for suffi- 
ciently largevalues of A. 


ee 
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On the other hand, if FP, vq bl F y,vq,8> (Vit L)/Ms (8) 


it follows from (7) that 0,()/8,(f) >(v,+1)/v, and hence A,(f)>A,(f). This would imply that for some 
A, = A, = A, fi < fy. It is easy to find cases where (8) holds, e.g. F's 30, 0-10/F's0, 30,010 = 1°175> 1-111, but 
it is found empirically that the correspondirtg values of 6, (and so A,) are so large that the test employed 
must have a very low significance level. For such tests it seems likely that there will be a range of values 
of A (which must correspond to values of # less than 0-5) for which £;<fy when A, = A, = A, so that 
Patnaik’s conjecture would be false. The fact that the significance !evel of the test must be small does not 
greatly affect the accuracy of the approximation (4), since this is dependent on the valueof F,, ,, ./(1+9,) 
and not on F,,, ,, . alone. 

Although it is unlikely that Patnaik’s conjecture is universally true it appears that for the usual range 
of significance levels (0-001 <a@<0-10) the genera! relationship between f£, and fy for A, = A, = A is 
roughly as follows: 

(i) #, is slightly greater than £,, for small A; 

(ii) £, approaches unity much more rapidly than /,,, the difference becoming more and more notice- 
able as # increases above 0:5. 


6. An alternative approach of a somewhat more general nature may be developed as follows. 

We can consider the random model as specifying the frequencies with which various values of the 
B,’s occur in an infinite sequence of parametric models. The random model will thus specify the distribu- 
tion of 4,, p(A, | @,), say, in the sequence of parametric models. Hence it follows that 


@ 
By(4) = i BY(9,) P(A; | 92) d8,. 
Expanding /,(@,) in Taylor’s series about the point 6, = 0, we obtain 
By(92) = By(O2) + $442(9; | 9g) Pr(Oe) + e3(A | 2) Br (Oe) + ---s (9) 
provided #4(9, | A.) = 8. 
In the present case LB} = L(u,—@)*, and so 0,, given 9,, is distributed as y%_,4_/(k—1). 
Hence , 
H4(9; | Og) = 9p, 
Ha, | 92) = 265/(k— 1), 
Ha(9; | Og) = 863/(k— 1)*. 


In a typical power curve £”(9) is positive for small 0, up to a point of inflexion which is usually near the 
point £ = }. Thereafter £*(0) is negative. #”"(9) is usually negative for all 0. We would thus expect to find 
By(9) > BY) for small 6 but £,,(@) < £,() for larger 0, as we have in fact inferred in § 4. 
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On exact grouping corrections to moments and cumulants* 
By MORTON KUPPERMAN, Arlington, Virginia 


1. INTRODUCTION 


The usual derivations of the Sheppard corrections to moments and cumulants of grouped frequency 
distributions either require the making of certain assumptions about the nature of the parent (ungrouped) 
distribution (such as high-order terminal contact) or consider the corrections from the viewpoint of 
average corrections, the results of the latter method being true not for any one particular grouping mesh 
but for all grouping meshes on the average, assuming a uniform distribution of the mid-points of the 


* Based in part on a thesis submitted in partial satisfaction of the requirements for the M.A. degree at The 
George Washington University. 
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grouping intervals, the mid-points being randomly selected. The former method of derivation usually 
requires the use of the Euler-Maclaurin sum formula, whereas the latter method is easily carried out by 
means of characteristic functions. (See the book by Kendall (1947) for a discussion of the various 
methods and assumptions involved in corrections for grouping.) 

Inasmuch as the formulas thus derived are not ‘exact’, it is of interest to consider a few types of 
simple frequency distributions and to attempt to derive exact ‘Sheppard’ corrections for grouping for 
a predetermined grouping mesh, not randomly chosen. 


2. RECTANGULAR DISTRIBUTION 
The frequency function is given by f(x) = 1/(2a), where —a<x<a. The characteristic function of the 





rectangular distribution is a de eee 
P(t) = ) eft2 — = 
a 2 at 
The cumulants are given in the expansion of log ¢(t) as an infinite series: 
(iat)? (iat)* (iat)® 2°" B,,(iat)?" 


log $(t) = | -—Te0 + 9835 ~ °° + Brant 


provided sin at>0 and a*t?<7*. We therefore choose ¢ such that 0<t<z/a. Hence all odd cumulants 


22rB 2r 
vanish and the even cumulants are given by Ke, = =. (The B,, are the Bernoulli numbers 
with even subscript. The first 11 Bernoulli numbers (with odd or even subscript) are: By = 1, B, = — 4, 
B,=B,=...=0, B=}, B= —a, B= ds, B= -—v, By = i. Except for B, the Bernoulli 


numbers with odd subscript all vanish.) 

Now let us consider a particular grouped rectangular distribution. We divide the range 2a into k (any 
positive integer) equal intervals each of length h. The mid-points of these k intervals take the values 
—a+hh, —a+$h,..., —-a@+4(2k—1)h, and the class boundaries are —a, —a+h,..., --a+kh =a. The 
charwcteristic function of the grouped distribution is given by 


(sin at) /(at) 





O(t) = ——__—_—_.. 
= (ein $hey/(Hht) 
ar 
Thus we find Key = Ky + =x! , (1) 
K, = Ky (2) 
Kors = Kepsiy (3) 


where «, represents the rth cumulant of the ungrouped rectangular distribution and x, represents the 
rth cumulant of the grouped rectangular distribution. It is interesting to note that equation (1) is of 
the same form (but with a plus sign following x,, instead of a minus sign) as the Sheppard correction to 
cumulants derived by Langdon & Ore (1930) (see also Kendall, 1947) for any grouped frequency dis- 
tribution, with the usual assumptions under which the corrections to moments are derived. 
From (2) and (3) it is seen that no odd cumulants (and therefore no odd moments about the mean) 
require correction for grouping. For the even moments about the mean we find 
a = Tat teh’, (4) 
Ma = fig + Shy + Boh. (5) 
These corrections for the particular cases of uz, and 4, (equations (4) and (5)) for a grouped rectangular 
distribution are due to Elderton (1938) and are also given by Kendall (1947) as an exercise. As Kendall 
points out, the expression for p, is the usual expression for the Sheppard correction to the variance of 
a grouped frequency distribution but with a corrective factor of + sh? instead of —sh?. We also find 


He = He + Rhy + Poh py + ahah’, (6) 
Hs = fig + Gh%ig + Gh*fi, + Pah" f, + ze eah?. (7) 
3. TRIANGULAR DISTRIBUTION 


The frequency function of the triangular distribution is given by 


f(z) 


1 
qa (e+) (-a<2z<0) 


=(a-2) (O0<2<a). 





r 











TERE te 


a 
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The characteristic function of the distribution is 


P(t) = 2(1 —cosat)/(a*t?). 





The odd moments vanish and Po; = a**/(r +1) (2r+1). 
sin $at\? 
Since (t) = ( ) " 
¢ fat 


the cumulative function is given by 2(logsin 4at—log 4at). (Incidentally, it is seen from the form of 
$(t) that the triangular distribution represents the sampling distribution of the mean of a sample of two 
observations independently and randomly drawn from a rectangular population.) Now log ¢(t) as here 
given is similar to the cumulative function derived in the preceding section for the rectangular dis- 
tribution, and hence the odd cumulants vanish and the even cumulants are given by k,, = B,,a?"/r. 

Now let us consider the case of a grouped triangular distribution when the range 2a is divided into 
an even number 2m of intervals each of size h. Hence 2mh = 2a or mh = a. The area under the frequency 
curve for each grouping interval is the area of the trapezoid formed by the ordinates at the ends of the 
grouping interval, the frequency curve itself, and the x-axis. Hence it may be shown that the char- 
acteristic function of the grouped triangular distribution is given by 


__ h* sin? fat cos tht 
Wh 8 Faerie 
Then (t)/®(t) = (sin Jht tan ht) /( fhe)? 
OP se 2r 
From this we find on expansion ig PA, wit: atl (8) 


2r 


This is the general expression for correcting the even cumulants of a triangular distribution grouped 
into an even number of intervals. As was the case in the rectangular distribution, the odd cumulants 
of the grouped triangular distribution (even number of grouping intervals) need no correction. 


We also find - 
He = a2 — Tah, (9) 
My = fa — Sf, + Peght. (10) 


It will be noted that the exact correction to the variance is exactly the usual form of the Sheppard 
correction and that the exact correction to the fourth moment about the mean differs from the usual 
form of the Sheppard correction (#4, = 4 — $h?f, + xgeh*) by the value jgh!. 

In the case when the range is divided into an odd number of intervals, the expression for the char- 
acteristic function of the grouped distribution becomes a little more complicated and will not be given 
here. We may, however, derive by clementary means the exact correction to be applied to the variance 
in this case. (It is obvious that because of symmetry the mean and all odd moments about the mean 
require no correction for grouping.) 

Let the range 2a be divided into 2m + 1 classes of length h. Then (2m+1)h = 2a. The central interval 
has as its mid-point the origin; the area under the frequency curve corresponding to this central interval 

2 
is : _ x. On each side of the central interval there are m intervals, centred at +h, + 2h,..., +mh, and 


also at —h, —2h,..., —mh. The areas corresponding to these classes are respectively 
h h h 
—(a—h), —(a—2h),..., ~(a—mh). 
a a a 


Hence the variance of this grouped triangular distribution, with an odd number of classes or grouping 
intervals, is by definition 


oe oa 2 ney j on2) © 2h ny" h 
2 = (=-aa)+ (4%) =, (a—h) +( a (e— )+...+(mh) aero ° 





h*, Remembering that in the ungrouped triangular distribution 


= f 3h* 
ba = Ha Het*(1- 3): (11) 


8a? 


which reduces to fig = }a? + pyh?-— = 


Pz = 4a? we have finally 
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This is an exact correction, but not as simple in appearance as the correction in the even case. It is 
obvious from (11) that the use of the usual expression for the Sheppard correction will in this case 
‘overcorrect’ the variance and leave us with a value that is too low by an amount equal to h*/32a?. 


4. SEMI-TRIANGULAR DISTRIBUTION 


Let us now consider the effect of grouping on the mean and variance of a distribution very closely 
related to the triangular distribution, namely, a distribution whose frequency curve is in the shape of 
the right half of the frequency curve of the triangular distribution. For want of a name we may call this 
new distribution the semi-triangular distribution. It is obvious that each ordinate of the semi-triangular 
distribution is twice the ordinate of the related triangular distribution, so that the area under the curve 
(which is a segment of a straight line with y-axis intercept equal to 2/a and x-axis intercept equal to a) is 
unity. The equation of the semi-triangular distribution is then f(x) = 2(a—)/a* for O<a<a. For this 
distribution the mean equals $a and the variance is ja, the second moment about the origin being 4a. 

Let us now divide the range a into equal intervals (either odd or even in number) each of size h. 
Proceeding as before for the triangular distribution, we may readily show that the exact grouping 
correction formulas are 


bi, = 7 —h*/6a, (12) 

My = By — Teh’, (13) 
ye h? 

ta = fiat del (145). (14) 


Of interest is the result for 43, the second moment about the origin. This is the usual formula for the 
Sheppard correction, and is an exact result, even though the standard derivations of the Sheppard 
corrections are not valid for this distribution (absence of contact at the left terminal point of the range; 
also, the grouping mesh is not randomly chosen). From the expressions for yj and s, in (12) and (14) 
respectively, we see that for the semi-triangular distribution the process of grouping the range into an 
integral number of equal intervals overstates the mean and understates the variance, the amount of 


error depending upon the size of the grouping interval and being more for larger intervals and less for 
smaller intervals. 


5. PARABOLIC DISTRIBUTION 


The parabolic distribution is a special case of Pearson’s Type II distribution, as is the rectangular 
distribution. (It may be remarked that Jeffreys (1948) states that he thinks that the parabolic and 
rectangular distributions should be given special numbers in his suggested renumbering of the Pearson 
frequency curves.) Taking the mean of the parabolic distribution as the origin, the equation of the 
distribution is f(~) = 3(a*—x*)/4a8, where —a<a<a. This curve is symmetrical about the axis of 
ordinates and is unimodal. The variance is }a?. 

The area under the parabola and contained between two ordinates erected at the points x = sh and 
x = (8+1)h, which are chosen to fall within the range of the variable x(—a<2 Sa), where s is a real 
number and h is positive, is found to be 


St ((at— 4h) —0(0-+ 1) hp. 


Weshall here consider only the exact correction to be made to the variance of the grouped distribution. 
Although the same result will be found to be true for division of the range 2a into an odd or an even 
number of grouping intervals, it will be easier to derive the correction by separate consideration of the 
odd and even cases. 

Let the range be divided into an even number 2m of equal intervals of size h, so that 2mh = 2a. The 
variance of the grouped distribution is then found by ordinary summation, leading to the result 


2h? 
Ms = ta-vaht(1-=). (15) 
Because h <a, the quantity in parentheses is always positive and less than unity, so that ~,<j,. That 
is to say, the variance of the grouped parabolic distribution is always greater than the variance of the 
ungrouped parabolic distribution. This statement holds true for both an even and an odd number of 
grouping intervals. 

When the range is divided into an odd number of intervals, the result may be shown to hold by 
@ similar method. 











a 
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6. A NUMERICAL EXAMPLE 
The parabolic distribution obtained by assigning to the parameter a the particular value } is f(x) = 6(4 — x?) 
for —4<a<}. The size of the grouping interval will be chosen as h = 0-1, giving 10 grouping intervals. 
The following table gives the frequencies in these intervals: 


Interval f = frequency 
—0:5 to —0-4 0-028 
—0-4 to —0:3 0-076 
—0-3 to —0-2 0-112 
—0-2 to —0-1 0-136 
—Olto 00 0-148 

0-0to O1 0-148 
Olto 0-2 0-136 
02to 03 0-112 
0-3 to O04 0-076 
0-4to 0-5 0-028 

Total 1-000 


The mean of the grouped distribution is, because of symmetry, equal to zero. The variance is found to 
be exactly, 4, = 0-050,82. Were we to apply Sheppard’s correction, — j/gh? = —0-000,833, to p, we 
should obtain 0-049,987 as the corrected variance. This is a small over correction, since the variance of 


2h? 
the ungrouped distribution is 0-05. The exact correction derived above is —aet(1 -3) , which for 


h = 0-landa = 0-5 has a value of exactly —0-000,82. Thus the corrected variance is 


0-050,82 — 0-000,82 = 0-05, 
an exact result. 


7. EXPONENTIAL DISTRIBUTION 


This is the distribution whose frequency function is given by f(x) = ae-**, where 0<a and 0<x<o. 
The characteristic function is given by 
t) = ——_-. 
H(t) 1—it/a 
Now let us group the exponential distribution into an infinite number of groups, each interval of 
size h, with the lower boundary of the first class (or group) being the origin. The frequency for each 
grouping is the area under the curve for that particular interval and is given, in general, for an interval 
of length h and class boundaries of sh and (s+ 1)h by e-** (1 —e-), 
The characteristic function of the grouped distribution is then found to be 
sinh tah 
M(t) = -- “tr 
sinh (4ah — }ith) 
We shall not attempt here to derive the general expression for the moments or cumulants of the grouped 
exponential distribution by use of the characteristic function. Instead, by direct means, we can derive 
the mean and variance of the grouped distribution. For the mean we have 


fi, = dheoth Jah. (16) 
The exact correction to be applied to the mean of the grouped exponential distribution is then 
= ] 
=A (1 coth 4ah — : (17) 
a 


1 
Since 0<tanh $ah <4ah, we have 4hcoth tah — = >0 and hence 


My > My: (18) 
That is, the mean of a grouped exponential distribution is always greater than the mean of the ungrouped 
exponential distribution, the amount of upward bias being the value of the expression in parentheses 
in formula (17). The result expressed by (18) may appear obvious, of course, by consideration of the 
nature of the grouping process in the case of this distribution. The variance is similarly found to be 
fe = (Sh cosech $ah)*. (19) 
Therefore the exact correction to the variance is 


“s ] : 
Ba = Mat (1- }ath? cosech? sah). (20) 
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Since sinh ah > jah>0, we find that the quantity in parentheses in (20) is always positive. Hence 


Be <2: (21) 
That is, the variance of the grouped exponential distribution is always less than the variance of the 
ungrouped exponential distribution. This result and the corresponding one for the mean are true 
regardless of the size of the grouping interval h. 
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Discrimination in time-series analysis 


By A. RUDRA 
Department of Statistics, University College, London 


1. Discrimination in time series analysis will often reduce to choosing between autoregressive (A.R.) 
and moving average (M.A.)* schemes, and deciding on the appropriate order. We accept the principle 
that it is desirable to choose the lowest order possible for the scheme selected and this leads us to the 
following empirical sequential test procedure. 

Stage I. Test four randomness. If the result is significant we proceed to stage II. 

Stage IT. Test whether the scheme could be first-order A.R. (i.e. not a higher order A.R.). Test whether 
the scheme could be first-order M.A. (i.e. not a higher-order M.a.). If the s.r. test of the first order is 
significant but the M.A. first-order test is not significant, we decide that M.A. first order is a suitable 
structure for our series. If the reverse holds we decide in favour of first-order a.r. If neither test 
produces significant results we hold that either scheme would be suitable and go no further. If both 
tests are significant we proceed to stage III. 


Stage III. We test whether the scheme could be second-order a.R. We test whether the scheme could 
be second-order M.A. The decisions are as for stage II. If both are significant we proceed to third-order 
tests and so on. 

Thus for the (s+ 1)th stage of the test, if we reach there by means of two significant decisions at 
each of the preceding stages, we shall have the following ‘decision regions’ (based on separate 5% 
significance levels) in the sample space of the criteria d_,,,,), d,,,;, defined below: 


dys+1) 
A o 
! 
Higher order scheme Moving average Higher order scheme 


of order S 


region Autoregressive M.A. ©) or Autoregressive aa 
a eel —— _— ee eee eee ee ee ee + 
95", 0 =. 





of order S$ ree (5) of order S$ 
' 





T 

Higher order scheme Moving average Higher order scheme 

of order S 
! 


XX ' J 


95°%, region 








2. Let there be a sequence of random variables x, (i = 1, 2, ..., N) corresponding to a sequence of 


observations X, (¢ = 1, 2,..., N) at equidistant time points. Let p, denote the correlation between 2, 


* To save space we shall abbreviate, writing A.k. for autoregressive and M.A. for moving average. 
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and 2,,, in the population and let p,, denote the corresponding partial correlation coefficient between 
x, and x,,, for #41, ..., 243, constant. These population correlations have the following properties: 
(a) If the scheme is A.R. of order k, then 


P.¢#0 for s<k, 
= 0 for s>k, 
P. +9 for = 0, 1, 2, ..., oo. 
(b) If the scheme is M.A. of order k, then 
p;¥#9 for s<k, 
Pp, = 9 for s>k, 
p..+0 for e=9, 1, 2, -.., 0. 


It is these properties which suggest the basis for the sequential scheme put forward in the first 
section. At the sth stage of the procedure the test of the hypothesis of a moving average structure is 
equivalent to testing that p, = 0. The criterion which we would use for this test is 


1s 


i 
(4208+... +2») ' 





3 


where N is the number of observations and r, is the sample correlation coefficient between x, and 2;,,. 
It is seen that d, is an approximation to 
rs 


1 F 
(514 268+... +e») : 





where the expression in the denominator is the standard error of r, when p; = 0, for i>s, from Bartlett’s 
result (1946). The testing of the hypothesis of an autoregressive scheme at the sth stage is equivalent 
to testing that p,, = 0, using for this purpose the criterion 


d., is the likelihood ratio criterion for testing the hypothesis #,=0 in an autoregressive scheme of 
order s, where /, is the sth regression coefficient. In both cases we make the assumption that N is 
large enough to justify the assumption that both d, and d_, can be considered as unit normal variables 
under the null hypothesis. The correlation coefficients are calculated about mean zero, using the 
product moment definition. 

For the case when s = 1 we have a seeming lack of consistency in the two tests, for, strictly, they 
should both be the same at this stage. Thus if we set up the criteria 


(a)|d,|>1-96 and (6) |d,|> 1-96, 
we have that they are equivalent to (a) |r, |>1-96//N, 
and (6) |r, |>1-96/(N + (1-96)?)4. 


The two tests are’seen not to be equivalent unless N be large. For the cases we are considering, when 
N is supposed large enough for r to be considered normally distributed, it is clear that tests (a) and (6) 
will lead to the same result. 

The fact that it is not only possible but likely that we are unable to discriminate when using this 
procedure is not in itself an undesirable feature. No observed series will be exactly of one or the other 
structure, and when no decision is reached between A.R. and M.A. it will mean that either structure 
will do equally well for purposes of description. We note that the procedure will always give a decision 
regarding the order of the model. 


3. The test procedure proposed has been applied to twenty-eight series, nine artificial a.R. series 
taken from Kendall (1949) and two artificial M.a.series from Wold (1949), and seventeen observed series, 
taking 1-96 as the critical limit for | d, | and | d., |. In the case of the artificial a.R. series we reach the 
correct result on every occasion with one exception. This exception is Kendall’s series 4, when our 
decision would be that it is an a.z. of order 3, whereas it is in fact an a.R. of order 2. In one of the 
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artificial M.a. series too low an order is obtained. In the case of the natural series the decisions reached 
by the present method bear comparison with those already reached by other workers. Since these 
results are of interest, they have been summarized in Tables 1A and 1B. We note regarding Table 1B 
that Wold did not specifically lay down the order of the 4.R. and o.a. structures for Cost of Living 
and Wheat Prices (Beveridge). He did, however, remark that he thought order 2 for a.R. and order 1 
for M.A. would be sufficient. For Beveridge’s wheat prices series, according to our tests, the structure 
which would be suitable for description may be M.A. (1) or a.R.. (2). Following our stated principle, 
we decide in favour of the lower order, that is M.A. (1). It is interesting to note that our decisions 
agree almost entirely with those of Kendall (19446) which are based on the correlogram method, but 
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with only one of the four decisions of Quenouille (1947) which are based on his own method. 


Source 

Kendall (1949) 1 
2 

3 

4 

8 

10 

12 

14 

16 

Wold (1949) A 
D 


Table 1A. Artificial series 


Nature of 
series 
A.R. (2) 
A.R. (2) 
A.R. (2) 
A.R. (2) 
A.R. (3) 


Random 
element 


Rectangular 
tectangular 
Rectangular 
Rectangular 
Normal 
Normal 
Normal 
Normal 
Normal 
Normal 
Normal 


Present decision 
A.R. (2) or M.A. (2) 
A.R. (2) 

A.R. (2) 
A.R. (3) 
A.R. (3) 
M.A. (3) or A.R. (3) 
M.A. (3) or A.R. (3) 
M.A. (3) or A.R. (3) 
A.R. (3) 
M.A. (1) or A.R. (1) 
M.A. (1) 


Table 1B. Comparison of decisions reached by present and previous techniques 


Series 
Wheat yields 
Potato yields 
Barley prices 


Oat prices 


Wheat acreage 
Barley acreage 
Oat acreage 
Potato acreage 
Pigs 

Sheep 

Horses 

Cows 


Wheat prices 
“Cost of living 
Wheat prices (Beveridge) 


Marriages 
Sun spots 


Reference 


Kendall (1944)+ 
Kendall (1944) 
Kendall (1944) 


Kendall (1944) 


Kendall (1944) 
Kendall (1944) 
Kendall (1944) 
Kendall (1944) 
Kendall (1944) 
Kendall (1944) 
Kendall (1944) 
Kendall (1944) 


Kendall (1944) 
Wold (1938) 
Wold (1938) 


Kendall (1946) 
Yule (1927) 


Present decision 
Random 
Random 
A.R. (2) 


A.R. (4) 


A.R. (2) 

M.A. (1) [A.R. (2)] 
A.R. (2) 

A.R. (2) or M.A. (2) 
A.R. (2) or M.A. (2) 
A.R. (2) 

A.R. (2) 

A.R. (4) or M.A. (4) 


A.R. (2) 
A.R. (2) 
M.A. (1), [A.R. (2)] 


A.R. (4) 
A.R. (2) 


Other decisions 


Kendall: random 
Kendall: random 
Kendall: a.R. (2) 
Quenouille: not a.r. (2)* 
Kendall: a.r. (2)* 
Quenouille: not a.i. (2) 
Kendall: a.r. (2) 
Kendall: s.r. (2) 
Kendall: a.n. (2) 
Kendall: A.r. (2) 


Kendall: a.r. (2) 
Kendall: A.nr. (2) 
Kendall: a.r. (2) 
Kendall: a.r. (2)* 
seen A.R. (2) 
Quenouille: A.k. (2) 


Wold: a.R. 
Wold: m.a. 

{ Quenouile: not A.R. (2)* 
Kendall: a.r. (2) 
Kendall: a.r. (2) 

Yule: a.R. (2) 


* Denotes present and past decisions do not agree. 


¢ The reference is to Kendall’s (19446) paper. 


4. As an illustration of the method of procedure we consider the case of the sun-spot series, the last 
referred to in Table 1B. There are 176 observations. The first five serial correlations are 0-811, 0-434, 


0-0316, 0-264, and — 0-404, 


Stage J. ‘Vest for randomness: 


0-811 
1/\(175) 


a result which is obviously significant. 


= 10-8, 
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Stage II. The estimated standard error of r, ( = 0-434) in the M.A. test is 
1 3 
—— (1+ 2(0-811)?)} = 0-115, 
(ra + 2( ) ) 
0434 
and d, = -——— = 3-77, 
0-115 
which is significant. We proceed to test also for autoregression. The results of the calculations are 
summarized in Table 2. 


Table 2. Discrimination in the case of Wolfer’s sun-spot series 








Autoregression Moving average 
it -~ ‘ e “i — 
S.E. S.E. 

Partial of the Serial of the 

corre- corre- corre- corre- 
Stage lation lation Ratio Result lation lation Ratio Result 
I 0-811 _ — — 0-811 0-0754 10-75 Significant 
II 0-653 0-057 —11-51 Significant 0-434 0-115 3-77 Significant 

Ill —0-101 0-075  — 1-36 Not significant 0-0316 0-124 0-26 Not significant 

iV 6-013 0-075 0-18 Not significant 0-264 0-124 2-13 Significant 
sh — 0-050 0-075 0-66 Not significant —0-404 0-127 3:18 Significant 


Following the procedure which we have outlined in the first section it is seen that we should stop 
at stage III when neither the a.R. or the M.a. tests give a significant result, and our conclusion should 
be, strictly, that the structure could be a.R. or M.A. of order 2. Certain theoretical considerations 
suggest some modification in this automatic procedure. It seems adequate to stop the autoregressive 
tests as soon as a non-significant result is reached; but for the moving average scheme, although in 
theory calculations may be halted as soon as there is a non-significant result, in practice it may be an 
advisable precaution to calculate one or two further successive results in order to be on the safe side. 
Thus if non-significance is reached with a moving average scheme at the sth stage, in practice we 
might proceed to the (s+ 1)th and the (s+ 2)th stage. If at either of these stages a significant result is 
reached, we should reject the non-significance at the sth stage. Thus in the example worked out in 
Table 2 we should neglect the non-significant result of the moving average scheme at stage III in 
view of the significant results at stages [TV and V, and we should therefore declare ourselves in favour 
of an autoregressive scheme of order 2. The reason for this modification is that, as is well known, the 
serial correlations of an A.R. series oscillate about zero, so that the test at a certain stage, might appear 
to support an M.A. structure. Continuance of the series of tests avoids this criticism. If the partial 
correlations of M.A. series oscillated about zero in the same way it would correspondingly be desirable 
to continue testing beyond the point at which a decision had apparently been reached. 

If we proceed beyond stage I with Wold’s two artificial M.A. series referred to in Table 1A, we do 
not reach significant results. This gives a correct interpretation for series A which is M.a. (1), but 
a wrong interpretation for series D which is m.a. (4). This last serics is of some interest. Wold gives 

; Pp; = 0°60, pp=9-09, pyg=—0-15, p,=—9-10, 
whereas r, = 0-661, r, = 0-231, 13 = 0-047, r, = 0-144. 
Were we in a position to use the correct value of p, at stage II instead of r,, we should have found d, 
just significant at the 5% level and concluded that the series was M.A. (2). However, it is clear that 
with such low values of p, p, and py the chance of proceeding beyond stage I is hound to be small 
when, as in this case, the total number of observations is only N = 125. 


5. In any scheme of testing hypotheses it is of basic importance to assess the chances of reaching 
incorrect conclusions. In the present case it is also important to divide the incorrect conclusions into 
different categories. As a result of each sequence of tests, one of an infinite number of possible decisions 
will be made, e.g. ‘the series is M.A. (2)’ or ‘the series is either M.A. (3) or A.R. (3)’. Given a series of 
known structure, the probabilities of these various types of decision describe the operation of the 
test. ‘Tables 3 and 4 show approximate* probabilities of various types of decision when the true 


* In the calculation of these probabilities it was assumed that the joint distribution of d,, d, andd, 
is multi-normal, with parameters given by large sample approximations. In calculating the prob- 
abilities of joint events such as ‘d, significant, d, not significant, d.. significant’ (which would lead 
to the conclusion M.A. (1)), the expansion of the truncated trivariate normal integral due to Moran 
(1948) was used. 











438 Miscellanea 


structure is (A) M.A. (1), (B) A.R. (1). The sample size in each case is 100. Each row of the table refers 
to a different structure, defined by the value of the first-order serial correlation coefficient. In Table 3 
a nominal 5% significance level is supposed to be used in each test of the sequence, and in Table 4 


Table 3A. T'rue structure M.a. (1). Sample size 100 (5 % level) 
Correct decisions 


























, ‘ 
Incorrect decisions Ambiguous 
r —* ~ Strictly M.A. (1) or 
Pi Random A.R. (1) Order > 1 M.A. (1) A.R. (1) Total 
0-5 0-00 0-00 0-05 0-91 0-04 0-95 
0-4 0-00 0-03 0-02 0-47 0-48 0-95 
0-3 0-12 0-04 0-01 0-14 0-70 0-84 
0-2 0-48 0-02 0-00 0-04 0-46 0-49 
0-1 0-83 0-01 0-00 0-01 0-15 0-16 
0-0 0-95 0-00 0-00 0-00 0-05 0-05 
(random) 
Table 3B. True Structure a.r. (1). Sample size 100 (5% level) 
Correct decisions 
"is A — 
Incorrect decisions Ambiguous 
rc A” . Strictly A.R. (1) or 
pi Random M.A. (1) Order> 1 A.R. (1) M.A. (1) Total 
0-9 0-00 0-00 0-05 0-95 0-00 0-95 
0-8 0-00 0-00 0-05 0-95 0-00 0:95 
0-7 0-00 0-00 0-05 0-92 0-03 0-95 
0-6 0-00 0-01 0-03 0-77 0-18 0-95 
0-5 0-00 0-02 0-03 0-51 0-44 0-95 
0-4 0-01 0-04 0-01 0-26 0-68 0-94 
0-3 0-14 0-04 0-01 0-11 0-71 0-82 
0-2 0-48 0-02 0-00 0-03 0-46 0-49 
0-1 0-83 0-01 0-00 0-01 0-15 0-16 
Table 4A. True structure M.A. (1). Sample size 100 (1% level) 
Correct decisions 
A. 
= 
Incorrect decisions Ambiguous 
c » ~ Strictly A.R. (1) or 
Pr Random A.R. (1) Order> 1 M.A. (1) ‘M.A. (1) Total 
0:5 0-00 0-00 0-01 0-85 0-14 0-99 
0-4 0-04 0-01 0-00 0-26 0-69 0-95 
0-3 0-31 0-01 0-00 0-04 0-64 0-68 
0-2 0:73 0-00 0-00 0-00 0-27 0-27 
0-1 0-95 0-00 0-00 0-00 0-05 0-05 
0-0 0-99 0-00 0-00 0-00 0-01 0-01 
(random) 
Table 4B. True structure a.r. (1). Sample size 100 (1% level) 
Correct decisions 
ee A ~ 
Incorrect decisions Ambiguous 
r An ~ Strictly A.R. (1) or 
pr Random M.A. (1) Order> 1 A.R. (1) M.A. (1) Total 
0-9 0-00 0-00 0-01 0-99 0-00 0-99 
0-8 0-00 0-00 0-01 0-98 0-01 0-99 
0-7 0-00 0-00 0-01 0-86 0-13 0-99 
0-6 0-00 0-00 0-01 0°57 0-42 0-99 
0:5 0-00 0-01 0-00 0-28 0-71 0-99 
0-4 0-06 0-01 0-00 0-11 0-82 0-93 
0-3 0-33 0-01 0:00 0-03 0-64 0-67 
0-2 0-72 0-00 0-00 0-00 0-28 0-28 
0-1 0-94 0-00 0-00 0-00 0-06 0-06 
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a nominal 1 % significance level. The tables (4) start from p, = 0-5 as it is impossible to have |p, | > 0-5 
for M.A. (1). 

It will be noted that the probability of a strictly correct decision rises to a maximum of 1 minus the 
nominal level of significance. This is because there will always be a chance, equal to the nominal 
significance level, of finding the second order (or partial second order) correlation significant when the 
value of the corresponding population parameter is zero. This suggests that a rather low nominal 
significance level should be used as this will increase the chance of a correct decision when p, is big 
enough, while decreasing the chance of a correct decision for low values of p,, where the series differ 
comparatively little from a random series. 

It will be observed that the chance of a decision which is correct in order but wrong in type is small, 
as is also the chance of a wrong decision regarding order, except for small values of p,. On the other 
hand, in quite a large proportion of cases we shall be only ambiguously correct, in that we shall be 
unable to distinguish between the two types a.R. and M.A., although the order will be correctly 
determined. 

These probabilities apply to the case of series of order 1. Similar figures for series of higher order 
are in the process of being calculated. 

In conclusion I should like to convey my warm thanks to Dr N. L. Johnson and Dr F. N. David for 
invaluable suggestions made during the course of preparing this paper. 
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A note on ‘The estimation of the parameters of tolerance distributions’ 


By D. J. FINNEY 


I am indebted to Dr P. M. Grundy for drawing my attention to a careless slip in my paper in Biometrika, 
vol. 36, pp. 239-56. Unfortunately, this error had several consequences, and the following corrections 
should be noted. 

On p. 243, line 34, the term to be added to Sty should be (— KqF’’/F). 

In equations (27) and (28) on p. 244, the symbol S,, should be interpreted as SW(y—y)? increased by 

K*(1—q)*F” 
F 

for a batch of control subjects (7 — oo) and by 


for a batch of subjects with x- +00. The degrees of freedom are correctly stated. 
Immediately below (41) on p. 246, a statement should appear that S,, must be increased by 
n,(c—C)? 
o(1—C) ° 
In equation (43) on p. 249, the quantity 231-55 should be increased to 231-59, so giving 5-64 for the 
value of y?. 
Two lines below equations (51) on p. 251, a statement should appear that in forming x* the value 
of S,, must be increased by (8 —N)? 
yw ° 
In equation (54) on p. 254, the addition to S,, is (1070 — 1029-3)2/1029-3, or 1-61; hence 906-71 should 
be replaced by 908-32 and x* becomes 24-16. The heterogeneity factor on line 3 of p. 255 ought therefore 
to be 6-04, and the numerical values of standard errors and fiducial limits that follow need the obvious 
consequential alterations. 
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REVIEWS 


Proceedings of the Second Berkeley Symposium on Mathematical Statistics and 
Probability. Edited by J. Neyman. University of California Press. 1951. Price $11. 


Almost immediately after the last war a symposium on Mathematical Statistics and Probability was 
held in the Statistical Laboratory, University of California, Berkeley. The avowed object of this 
Symposium was a review of such research work as had been carried on during the war years with the 
idea of refreshing and bringing up to date those research workers who had had their attention occupied 
elsewhere. These proceedings were published as the First Berkeley Symposium, and this publication 
has proved useful to many beginners as well as to readers of more maturity. The objective of the 
Second Symposium is not so clear cut. The urgency of the feeling of the lost years has passed—this is 
apparent in the individual papers—and we have instead a volume of collected papers which clearly owe 
their inception to post-war research and any one of which might have been published in a contemporary 
scientific journal. What makes the volume of some interest, however, is that not all the papers would 
be printed in the same journal, so that in reading the volume we get an absorbing cross-section of 
Californian statistics in 1951. 

There are eight groups: mathematical statistics, probability, astronomy, biometry, econometrics, 
physics, traffic engineering, and wave analysis. The papers on mathematical statistics and probability 
present few new fundamentals and are concerned principally in the development of theories which 
have been delineated elsewhere. We would call the attention of readers to the paper of W. Feller, 
‘Diffusion processes in Genetics’, in which the theory of stochastic processes is applied to problems of 
population growth with a lucidity and clarity of exposition rarely equalled in statistical literature. 
The reviewer found the biometry section consisting of a paper by W. G. Cochran and a paper by 
J. Berkson to be by far the most interesting of any. Cochran writes on improvement by means of 
selection and appears to begin where K. Pearson left off in 1902. Cochran’s paper is stimulating and 
thought-provoking, and should remind many of us of the interesting statistical problems which have 
been left on one side unsolved by the present fashions in distribution theory. 

Dr Berkson’s paper will cause some statisticians to frown and others to laugh, depending on whether 
they regard maximum likelihood as an article of dogma or merely one useful statistical tool among 
many. He writes on ‘The relative precision of minimum chi-square and maximum likelihood estimates 
of regression coefficients’, and in his usual clear-thinking way proposes a number of questions the 
answers to which are not immediately obvious. 

To those who are not interested in biometry there is a choice of five papers in astronomy (Lindblad, 
Struve, Scott, Trumpler, Henyey), three on econometrics (Kuhn and Tucker, Marschak, Arrow), five 
on physics (Feynman, Lewis, Fériet, Lenzen, Placzek), two on traffic engineering (Berry and Belmont, 
Forbes) and two on wave analysis (Rudwick, Seiwell). The volume does, in fact, contain something of 
interést for all types and varieties of statistician and its possession will be coveted by many. It is 
unfortunate that its price will put it beyond the reach of most students. 

¥. N. DAVID 


Applied Statistics. A Journal of the Royal Statistical Society. Vol. 1. nos. 1 and 2, 
edited by Leonarp H. C. Tipperr. Edinburgh and London: Oliver and Boyd Ltd. 
Single number 10s., annual subscription 25s. 


Faced in the last 20 years with ever widening horizons the Royal Statistical Society has shown 
a welcome readiness to experiment and expand. The Industrial and Agricultural Research Section, 
with its associated Supplement to the Journal, was founded in 1933 with the object of providing greater 
opportunity for discussion of the more mathematically based techniques of statistics which had already 
begun to have a profound influence in biological, medical and agricultural research and which were 
just beginning to find their way into the production and research sides of industry. In the early 
meetings of the I.A.R.S. and in the pages of the Supplement considerable effort was made to illustrate 
in simple form the way in which theory could be of practical service in a wide range of fields of 
application. Such pioneer expository work requires, however, a great deal of effort, and by 1939 the 
organizing committee of the Section was already finding it difficult to provide a series of papers which 
could hold together the mathematical and industrial statisticians. 
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The war made a complete break in the Section’s activities, and when it was over the interest in all 
branches of statistical activity had increased enormously. The time had come for the next step forward ; 
a Research Section of the Society was formed whose primary interest was to lie in the development of 
the more mathematical techniques; in 1949 it was given as a means of publication the new Section B 
(Methodological) of the Journal. At the same time an Industrial Applications Section was created with 
branches holdings meetings in London and other centres in England and Wales. The Study Group also 
continued as a ‘Section’ in London and later in Bristol. But there was still a gap in the structure, 
a publication which would appeal to all those working statisticians whose interest lies in the application 
of newly developed methods rather than in the underlying theory. 

The third step forward has therefore been taken by the Council in publishing, with the help of 
Messrs Oliver and Boyd Ltd., the new periodical, Applied Statistics. Its aim, as last year’s President 
of the Society writes in the Foreword to the first number, is 
‘to meet the needs of all workers concerned with statistics—not of professional statisticians only but 
also of those innumerable workers in industry, commerce, science, and other branches of daily work, 
who must handle and understand statistics as part of their tasks. Its aim, in short, is to present in 
one way or another but always simply and clearly, the statistical approach and its value, and to illus- 
trate in original articles modern statistical methods in their everyday applications.’ 

The journal has a widely representative editorial committee, headed by L. H.C. Tippett. Two of the 
parts of volume 1 have now been issued and provide very good reading. The contents are made up of 
a short Editorial, six or seven main articles, Notes and Comments, Reports of the activities of the 
Industrial Applications Section and the Study Section, and Book Reviews. As the editor writes, it is 
in the hands of the readers to develop ‘Questions and Answers’ and ‘Letters to the Editor’ into lively 
features. 

There is great variety in the thirteen main articles so far published. It is fitting that first place should 
have been given to an article on ‘The introduction of statistical methods in industry’, by Bernard 
Dudding, whose tireless energy and enthusiasm over many years has meant so much to the cause of 
applied statistics. The account of the early development of statistical ideas at the General Electric 
Company is interesting. Many good things in statistics have appeared independently in duplicate or 
triplicate at different times and places, and allowing for the wide differences in setting between the 
Dublin Brewery and the Research Laboratories at Wembley, Dudding’s story contains much that is 
reminiscent of Gosset’s, 20 years before. 

As a sample of the other contributions one may mention: two regarding some problems tackled by 
the Social Survey, by Leslie Wilkins and by Thomas Corlett; an account of methods of estimating the 
future population, by Peter Cox; a description by Roy Allen of some of the methods used by the Cost 
of Living Technical Committee in reporting on index numbers of retail prices; an account, with 
mathematical background, of a method of measuring the accuracy of the systematic sampling of material 
from a conveyor belt, by Geoffrey Jowett; and finally, from George Dyke and Emily Simpson, the 
description of an investigation, co-ordinated at Rothamsted, into the differences which may occur 
when a standard analytical technique is carried out at different chemical laboratories all over the world. 

The journal is attractively printed; it has every prospect of being able to maintain the high level of 
its articles; we must hope for it an increasingly wide circulation. While it is primarily intended for the 
applied statistician, it provides also an easy and pleasant means for his more mathematically minded 
colleague to keep in touch with the varied interests of the great majority of those who job is to handle 
and understand statistical data at ordinary level. 

E. S. PEARSON 


Introduction to Statistical Analysis. By Witrrep J. Dixon and Frank J. MassgEy 
Jr. x +370 pp. McGraw-Hill Book Company, Inc. $4.50. 


A number of text-books suitable for basic courses in statistical methods are making their appearance on 
the other side of the Atlantic. On the whole, they are a decided improvement on similar text-books 
issued before the last war; knowledge of the modern methods of mathematical statistics is spreading, 
and this has had its effect on the methodological text-book. The book under review compares favourably 
with others of its kind. It follows the normal pattern of chapter contents, except that the variance is 
dealt with before the mean. It is not altogether clear why this should be so, except that, possibly, 
a homogencity test of the variance estimates in two samples should be undertaken before the extended 
‘Student’ t-test is made. The analysis of variance is introduced, as is the fashion nowadays, but the 
treatment is brief, stereotyped and lacks reality; an even briefer chapter on the analysis of covariance 
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does not succeed in making clear to the reader in what respects this particular tool assists the experi- 
menter. There are a number of miscellaneous chapters at the end. One, entitled ‘macro-statistics’, 
deals with punched card machinery. The reason for depicting the horrible objects on pp. 224-9 is not 
clear, unless it be to warn off the reader against having anything whatever to do with punched cards. 
Perhaps the chief weakness of the book is that in many respects it leaves the reader unconvinced. 
Many examples are cited in illustration of a succession of methods, but it is not clear in all cases why 
such methods may be necessary. The experiments which have beer: chosen for illustration are themselves 
not very realistic, and the authors do not really get down to an important part of the subject, namely, 
discussing what has been learnt as a result of the arithmetical analysis. In general, however, the methods 
used are sound, and the mathematics free from criticism. The book contains an extensive collection of 
tables, much more than the usual elementary text; it is thus self-contained. It is well printed and 

seems to be free from errors. 
JOHN WISHART 


CORRIGENDA 
E. Lorp, Biometrika (1947), 34 


p. 66, Table 9. The figure in the column for «=0-10 and the row for n=2 
should read 3-157 instead of 3-196. 
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